Commit ac826a6
feat(cuda): add Blackwell sm_100 target + persistent mempool tuning
- Add sm_100 (Blackwell B200/B300) to multi-arch fallback in build.rs
- Increase default mempool release threshold to 1GB for persistent workloads
- Add AsyncPoolConfig::for_persistent_actors() (4GB threshold, 256MB pre-alloc)
- Add AsyncPoolConfig::for_batch_processing() (128MB threshold, 32MB pre-alloc)
- Based on NVIDIA guidance for stream-ordered allocators with persistent kernels:
high release threshold prevents OS reclaim during sustained operation
Also confirmed: CUDA Graph Conditional Nodes (IF/WHILE/SWITCH) available in
cudarc 0.19.3 via CU_GRAPH_NODE_TYPE_CONDITIONAL + cuGraphConditionalHandleCreate.
Documented as future optimization path for GPU-side actor state machines.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent d625e03 commit ac826a6
2 files changed
Lines changed: 36 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
213 | 213 | | |
214 | 214 | | |
215 | 215 | | |
216 | | - | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
217 | 222 | | |
218 | 223 | | |
219 | 224 | | |
| |||
223 | 228 | | |
224 | 229 | | |
225 | 230 | | |
| 231 | + | |
| 232 | + | |
226 | 233 | | |
227 | 234 | | |
228 | 235 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
43 | 43 | | |
44 | 44 | | |
45 | 45 | | |
46 | | - | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
47 | 74 | | |
48 | 75 | | |
49 | 76 | | |
| |||
0 commit comments