fix: increase timeout robustness#93
Conversation
Two configuration tweaks that independently reduce the chance of a bulletin upload failing on the polkadot-api WS heartbeat or hitting the default Node heap ceiling. No behavioural change to upload code. - packages/cli/src/bulletin/store.ts: Pass heartbeatTimeout=300_000 and timeout=10_000 to getWsProvider. The 40_000 ms ws-provider default is shorter than a single Bulletin chunk's worst-case best-block inclusion latency we measured (>50 s outliers under contention), so the transport was tearing down healthy WS connections while the chain was still acknowledging an in-flight extrinsic. - .github/actions/bulletin/action.yml: Set NODE_OPTIONS=--max-old-space-size=4096 in both the authorize and upload steps. ubuntu-latest runners have ≥16 GiB RAM; default ~2 GiB Node heap leaves no headroom if a transient leak window opens. Cheap safety net.
Move chunk.bytes = null into a finally so a failed submission also releases its 2 MiB. Today the buffer is only freed on success; on failure (stall, retry, abort) it stays alive via the closure, pinning memory for the lifetime of the upload. The payload is re-streamed from disk on the next attempt, so dropping the in-memory copy is always safe.
CI Summary
Release - PassedTest this PR Download artifact (GitHub CLI required): gh run download 24462300780 -n cli-release-0.0.0-pr.93 -R paritytech/dotns-sdkInstall globally: npm install -g ./parity-dotns-cli-0.0.0-pr.93.tgzVerify: dotns --helpDeploy UI — Passed
Deploy Example — Passed
Labelspkg: cli, scope: bulletin Test - Passed134 passed, 0 failed across 134 tests. |
| // success it is already persisted; on failure it must not pin | ||
| // memory while the caller decides whether to retry — the | ||
| // payload is re-streamed from disk on the next attempt. | ||
| chunk.bytes = null as unknown as Uint8Array; |
There was a problem hiding this comment.
Don't we want only null if there is success?
We could also restructure so in-wave failures reread the bytes from disk, otherwise that is only happening at the wave boundaries now if I am not mistaken (via runWaveWithRetries).
With this change as is, transient failures won't recover on retry
There was a problem hiding this comment.
happy to seperate this bit just now and focus on the timeouts. I took this bit out of a bigger error handling piece which might be needed for it to make sense. The timeout stuff should at least help.
There was a problem hiding this comment.
reverted it to focus on the timout orchestration in the first instance. Error handling/memory release I'll move to a seperate pr (if we want it)
This reverts commit 4e9b6a8.
|
lets see what the affect is. should be at least better |
Description
Type
Package
@parity/dotns-cliRelated Issues
Fixes
Checklist
Code
bun run lintpassesbun run formatpassesbun run typecheckpassesDocumentation
Breaking Changes
Breaking changes:
Testing
How to test:
Notes