Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
d9b0d2d
refactor(dispatch_combine): minimize FlyDSL core deltas, inline helpers
yanboshao May 17, 2026
45fd068
style: satisfy pre-checks (black + ruff)
yanboshao May 17, 2026
0ddc2a0
ci(dispatch_combine): restrict tests to 8-GPU runners
yanboshao May 18, 2026
48177f4
fix(ci): restore arith.py style-compliant import layout
yanboshao May 24, 2026
988e461
feat(dispatch_combine): in-kernel grid barrier + dynamic arch/LDS check
yanboshao May 21, 2026
7e0cac8
feat(dispatch_combine): verify_self with mori-parity byte verify
yanboshao May 24, 2026
519269e
chore(dispatch_combine): reserve combine_no_stage1 and refresh CI swe…
yanboshao May 24, 2026
fe3cb1c
refactor(dispatch_combine): align intranode op with mori parity surface
yanboshao May 25, 2026
2dd0661
style: satisfy black + ruff pre-checks
yanboshao May 25, 2026
e580947
refactor(dispatch_combine): drop frozen launch-geometry params + slim…
yanboshao May 25, 2026
b48a393
fix(dispatch_combine): wpb=16 wide-path no-op + verify_self desync ha…
yanboshao May 26, 2026
0dd6e07
style: satisfy black on multi-line skip-print
yanboshao May 26, 2026
126f93f
rm dead code
yanboshao May 26, 2026
64fc9c4
fix(dispatch_combine): cache-key combine_launch dtype via str(...) in…
yanboshao May 26, 2026
4491237
feat(dispatch_combine): mori-parity P2P-read with launch-time dtype J…
yanboshao May 26, 2026
5c49b80
perf(dispatch_combine): 396-case mixed-dtype sweep report
yanboshao May 26, 2026
e91020f
perf(dispatch_combine): rerun 396-case sweep with mori head-to-head
yanboshao May 26, 2026
625440c
refactor(dispatch_combine): CI sweep cleanup + test-harness hardening
yanboshao May 27, 2026
f079e40
docs(ci): align flydsl.yaml comment with zero-copy combine rename
yanboshao May 27, 2026
85dc549
fix(dispatch_combine): plumb combine_dtype_key through mori-parity pr…
yanboshao May 27, 2026
0de608b
feat(dispatch_combine): add bs={4,32,8K} fp8->bf16 mixed CI cases + m…
yanboshao May 28, 2026
489681a
chore(dispatch_combine): drop perf-sweep working artifacts from PR (k…
yanboshao May 28, 2026
abf5838
fix mi325 CI issues
yanboshao May 29, 2026
4f1cb3d
refactor(dispatch_combine): parameterise Stage 3 unroll + slim comments
yanboshao Jun 1, 2026
2fb9e6e
fix(dispatch_combine): scatter all scale i32 slots + restore out_scal…
yanboshao Jun 2, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 14 additions & 2 deletions .github/workflows/flydsl.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -350,15 +350,15 @@ jobs:
docker rm flydsl_test

# ---------------------------------------------------------------------------
# Multi-GPU allreduce tests: ONLY for 8-GPU runners.
# Multi-GPU communication operator tests: ONLY for 8-GPU runners.
# Runs on BOTH linux-flydsl-mi325-8 AND linux-flydsl-mi355-8 independently.
# Triggered when PR has label "multi-gpu" (added by a maintainer), or when
# the workflow is manually dispatched.
# fail-fast: false ensures both runners always complete even if one fails.
# ---------------------------------------------------------------------------
multi-gpu:
needs: test
name: Multi-GPU AllReduce Tests (${{ matrix.runners }})
name: Multi-GPU Communication Operator Tests (${{ matrix.runners }})
timeout-minutes: 120
env:
# Keep checkout independent of runner-local git-cache rewrites.
Expand Down Expand Up @@ -469,6 +469,18 @@ jobs:
-m multi_gpu -v --no-header --tb=short
"

- name: Run multi-GPU dispatch/combine CI sweep (8-GPU, accuracy + cudagraph perf)
timeout-minutes: 30
run: |
docker exec flydsl_test bash -c "
cd /flydsl-test &&
python tests/kernels/test_profiler_dispatch_combine.py \
--ci-sweep \
--world-size 8 \
--port 29503 \
--output-dir /tmp/flydsl_ci_sweep
"

- name: Run multi-GPU allreduce tests
timeout-minutes: 30
run: |
Expand Down
Loading
Loading