Skip to content

[follow up on #1259] gpu witnessgen flow#1303

Open
hero78119 wants to merge 17 commits intofeat/gpu-witnessgenfrom
feat/gpu-witnessgen_flow
Open

[follow up on #1259] gpu witnessgen flow#1303
hero78119 wants to merge 17 commits intofeat/gpu-witnessgenfrom
feat/gpu-witnessgen_flow

Conversation

@hero78119
Copy link
Copy Markdown
Collaborator

@hero78119 hero78119 commented Apr 13, 2026

summary of data life cycle during entire proving

  • During opcode assignment
    • shard raw GPU state is uploaded / kept resident:
      • StepRecord
      • shard metadata
      • shared shard buffers as needed
    • GPU still runs assignment-time kernels for side effects:
      • LK multiplicity
      • shardram / shared-circuit accumulation
    • witness trace is not kept as an eager RMM anymore
    • per-chip replay plan is recorded
  • During commit_traces
    • no full witness set is resident up front
    • for each trace, deferred commit does:
      • regenerate that chip’s witness/device backing from resident raw shard GPU state
      • commit that one trace
      • drop that transient witness before moving to the next trace
    • after commit finishes, only raw shard GPU state remains resident
  • During per-chip proof
    • before a chip task proves, replay regenerates that chip’s witness/device backing from raw shard GPU state
    • chip proof uses it
    • task-local witness is dropped after that chip finishes
    • raw shard GPU state stays resident across all chip proofs
  • During PCS opening
    • replay regenerates the needed witness/device backing again from raw shard GPU state
    • opening uses it
    • transient witness is dropped afterward
  • At shard end
    • shard raw GPU state is released
    • replay/session metadata is invalidated

So the intended steady-state invariant is:

  • persistent across shard proof:
    • raw shard GPU state only
  • transient on demand:
    • witness/device backing per trace / per chip / per opening step

Two nuances:

  • Initial assign still runs GPU kernels because side effects are needed then, but it no longer keeps eager witness RMMs in cache-none mode.
  • The remaining OOM is now later in chip proving, not in commit, which is consistent with this lifecycle shift.

@hero78119 hero78119 force-pushed the feat/gpu-witnessgen_flow branch from 7eaf64c to c46a574 Compare April 13, 2026 07:59
@hero78119 hero78119 force-pushed the feat/gpu-witnessgen_flow branch from 19a50f5 to aebf3b2 Compare April 13, 2026 09:41
@hero78119 hero78119 force-pushed the feat/gpu-witnessgen_flow branch from ecc9007 to 0e99ebb Compare April 14, 2026 12:25
@hero78119 hero78119 force-pushed the feat/gpu-witnessgen_flow branch from 0e99ebb to 8ae8010 Compare April 14, 2026 13:10
@hero78119 hero78119 force-pushed the feat/gpu-witnessgen_flow branch from 79b60ad to e166b25 Compare April 16, 2026 12:40
@hero78119 hero78119 force-pushed the feat/gpu-witnessgen_flow branch from d57f469 to fdb277e Compare April 17, 2026 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant