Skip to content

fix: reduce Docker layers, add auto CI trigger, fix fake ops import#363

Merged
shijieliu merged 7 commits intoNVIDIA:mainfrom
JacoCheung:fix/reduce-docker-layers
Apr 17, 2026
Merged

fix: reduce Docker layers, add auto CI trigger, fix fake ops import#363
shijieliu merged 7 commits intoNVIDIA:mainfrom
JacoCheung:fix/reduce-docker-layers

Conversation

@JacoCheung
Copy link
Copy Markdown
Collaborator

@JacoCheung JacoCheung commented Apr 14, 2026

Summary

  • Merge RUN instructions in docker/Dockerfile to reduce total layer count, fixing the overlay2 128-layer limit (max depth exceeded) on CI nodes. Saves 12 layers (~119 total).
  • Add pull_request_target trigger to blossom-ci.yml so CI runs automatically on PR open/update (no need to manually comment /build)
  • Cherry-pick fix for fake ops wrapper import used in torch export (from @geoffreyQiu)
  • Enhance /build command: support /build devel and /build nightly flags to pass BUILD_DEVEL=1 / NIGHTLY_TEST=1 to GitLab pipeline (companion change in GitLab MR !125)

Dockerfile Changes

Stage Before After Saved
devel stage RUN 8 4 -4
build stage RUN 4 1 -3
Total new layers 21 9 -12

blossom-ci.yml Changes

  • Added pull_request_target: [opened, synchronize] trigger
  • Updated if condition: startsWith(comment.body, '/build') to support /build devel, /build nightly, /build devel nightly

/build Flag Support

Command Effect
/build Normal CI (unchanged)
/build devel Rebuild base Docker images (BUILD_DEVEL=1)
/build nightly Run 8-GPU nightly tests (NIGHTLY_TEST=1)
/build devel nightly Both

CI

Test plan

  • CI inference_build + inference_test_1gpu pass (no more max depth exceeded)
  • train_build + unit tests unaffected
  • Auto CI trigger works on PR open/sync after merge to main
  • /build devel triggers pipeline with BUILD_DEVEL=1 (requires GitLab MR !125 merged first)

🤖 Generated with Claude Code

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 14, 2026

Greptile Summary

This PR consolidates Dockerfile RUN instructions from ~21 to ~9 layers (saving 12 layers) to fix the overlay2 max depth exceeded error on CI nodes, extends the /build comment trigger to support startsWith matching for /build devel and /build nightly flags, and cherry-picks the correct fake-ops import ordering needed for torch.export.

  • The PR description states that a pull_request_target: [opened, synchronize] trigger was added to blossom-ci.yml, but it is absent from the actual diff and the current file — the workflow still only fires on issue_comment and workflow_dispatch. The test plan item "Auto CI trigger works on PR open/sync after merge to main" remains unchecked, suggesting this feature was not implemented in this PR.

Confidence Score: 5/5

Safe to merge — all functional changes are correct; the only finding is a P2 description/code mismatch.

No P0 or P1 findings. The Dockerfile layer consolidation is functionally equivalent to the original. The Python fake-ops import ordering is correct and uses the established isort: off/on pattern. The startsWith change in blossom-ci.yml is a clean and safe extension of the existing trigger. The single P2 comment is about the PR description claiming a feature (pull_request_target trigger) that is not actually present in the code — this does not affect runtime behavior.

.github/workflows/blossom-ci.yml — verify whether the pull_request_target trigger was intentionally omitted or is a missing implementation.

Important Files Changed

Filename Overview
.github/workflows/blossom-ci.yml Updated if condition from exact /build match to startsWith to support /build devel and /build nightly flags; PR description claims pull_request_target trigger was added but it is absent from the actual file.
docker/Dockerfile Merged 8 devel-stage RUN instructions into 4 and 4 build-stage RUN instructions into 1, reducing total Docker layers by 12 to resolve the overlay2 128-layer limit; changes are functionally equivalent to the original.
examples/hstu/modules/exportable_embedding.py Added ordered fake-ops imports (dynamicemb meta, hstu_cuda_ops, fake_hstu_cuda_ops) under isort: off/on guard to ensure correct op registration sequence before torch.export.
examples/hstu/ops/fused_hstu_op.py Added import hstu.hstu_ops_gpu to register fake implementations needed for torch.export; minimal and correct change.

Sequence Diagram

sequenceDiagram
    participant Dev as Developer
    participant GH as GitHub PR
    participant BCI as Blossom CI (GH Action)
    participant GL as GitLab Pipeline

    Dev->>GH: Comment /build [devel|nightly]
    GH->>BCI: issue_comment event (created)
    BCI->>BCI: Authorization: actor in allowlist?
    BCI->>BCI: Authorization: startsWith(body, '/build')?
    BCI->>BCI: Vulnerability scan (checkout + blossom-action)
    BCI->>GL: START-CI-JOB (blossom-ci, passes flags)
    GL-->>GH: Pipeline status reported back
Loading

Reviews (13): Last reviewed commit: "ci: remove pull_request_target trigger, ..." | Re-trigger Greptile

Comment thread docker/Dockerfile Outdated
@JacoCheung JacoCheung force-pushed the fix/reduce-docker-layers branch from b816ab3 to df8ec8a Compare April 14, 2026 14:13
Comment thread docker/Dockerfile Outdated
Aggressively merge RUN instructions in the Dockerfile to reduce total
layer count from ~126 to ~119. The inference image was hitting the
overlay2 128-layer limit ("failed to register layer: max depth
exceeded") on CI nodes.

devel stage: 8 RUN + 1 COPY -> 4 RUN + 1 COPY (-4 layers)
build stage: 4 RUN + 1 COPY -> 1 RUN + 1 COPY (-3 layers)
FBGEMM and TorchRec kept as separate layers for build cache efficiency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@JacoCheung JacoCheung force-pushed the fix/reduce-docker-layers branch from df8ec8a to a105838 Compare April 14, 2026 14:37
Comment thread .github/workflows/gitlab-ci-bridge.yml Outdated
@JacoCheung
Copy link
Copy Markdown
Collaborator Author

/build

3 similar comments
@EmmaQiaoCh
Copy link
Copy Markdown
Collaborator

/build

@JacoCheung
Copy link
Copy Markdown
Collaborator Author

/build

@shijieliu
Copy link
Copy Markdown
Collaborator

/build

@shijieliu
Copy link
Copy Markdown
Collaborator

/ci

1 similar comment
@JacoCheung
Copy link
Copy Markdown
Collaborator Author

/ci

@JacoCheung
Copy link
Copy Markdown
Collaborator Author

/build

@JacoCheung
Copy link
Copy Markdown
Collaborator Author

/ci

@JacoCheung JacoCheung force-pushed the fix/reduce-docker-layers branch from 0279dc3 to a105838 Compare April 15, 2026 08:57
@shijieliu
Copy link
Copy Markdown
Collaborator

/build

13 similar comments
@EmmaQiaoCh
Copy link
Copy Markdown
Collaborator

/build

@JacoCheung
Copy link
Copy Markdown
Collaborator Author

/build

@JacoCheung
Copy link
Copy Markdown
Collaborator Author

/build

@JacoCheung
Copy link
Copy Markdown
Collaborator Author

/build

@JacoCheung
Copy link
Copy Markdown
Collaborator Author

/build

@JacoCheung
Copy link
Copy Markdown
Collaborator Author

/build

@JacoCheung
Copy link
Copy Markdown
Collaborator Author

/build

@JacoCheung
Copy link
Copy Markdown
Collaborator Author

/build

@JacoCheung
Copy link
Copy Markdown
Collaborator Author

/build

@JacoCheung
Copy link
Copy Markdown
Collaborator Author

/build

@JacoCheung
Copy link
Copy Markdown
Collaborator Author

/build

@JacoCheung
Copy link
Copy Markdown
Collaborator Author

/build

@JacoCheung
Copy link
Copy Markdown
Collaborator Author

/build

@JacoCheung
Copy link
Copy Markdown
Collaborator Author

/build

2 similar comments
@JacoCheung
Copy link
Copy Markdown
Collaborator Author

/build

@JacoCheung
Copy link
Copy Markdown
Collaborator Author

/build

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@JacoCheung JacoCheung requested a review from shijieliu April 15, 2026 15:51
@JacoCheung
Copy link
Copy Markdown
Collaborator Author

/build

@JacoCheung
Copy link
Copy Markdown
Collaborator Author

/build

@JacoCheung
Copy link
Copy Markdown
Collaborator Author

/build

@JacoCheung JacoCheung changed the title fix: reduce Docker image layers to avoid overlay2 max depth fix: reduce Docker layers, add auto CI trigger, fix fake ops import Apr 16, 2026
@JacoCheung
Copy link
Copy Markdown
Collaborator Author

JacoCheung commented Apr 16, 2026

Pipeline #48650175 -- canceling

Job Status Log
pre_check ✅ success view
train_build ✅ success view
inference_build ✅ success view
tritonserver_build ✅ success view
build_whl ❌ failed view
dynamicemb_test_fwd_bwd_8gpus ✅ success view
dynamicemb_test_load_dump_8gpus ❔ canceling view
unit_test_1gpu_a100 ❌ failed view
unit_test_1gpu_h100 ❌ failed view
unit_test_4gpu ❌ failed view
unit_test_tp_4gpu ❌ failed view
L20_unit_test_1gpu ❌ failed view
inference_unit_test_1gpu ✅ success view
inference_test_1gpu ❌ failed view

View full pipeline

The module hstu.hstu_ops_gpu does not exist as a Python module.
The C++ source hstu_ops_gpu.cpp compiles into hstu/fbgemm_gpu_experimental_hstu.so,
not a separate hstu_ops_gpu submodule. This import was incorrectly added in PR NVIDIA#327
and causes ModuleNotFoundError in CI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@JacoCheung
Copy link
Copy Markdown
Collaborator Author

/build

Comment thread .github/workflows/blossom-ci.yml Outdated
@JacoCheung
Copy link
Copy Markdown
Collaborator Author

/build

@JacoCheung
Copy link
Copy Markdown
Collaborator Author

JacoCheung commented Apr 16, 2026

Pipeline #48656842 -- failed

Job Status Log
pre_check ✅ success view
train_build ✅ success view
inference_build ✅ success view
tritonserver_build ✅ success view
build_whl ✅ success view
dynamicemb_test_fwd_bwd_8gpus ❌ failed view
dynamicemb_test_load_dump_8gpus ✅ success view
unit_test_1gpu_a100 ✅ success view
unit_test_4gpu ❌ failed view
unit_test_tp_4gpu ❌ failed view
L20_unit_test_1gpu ✅ success view
inference_unit_test_1gpu ✅ success view
inference_test_1gpu ❌ failed view
unit_test_1gpu_h100 ✅ success view

Result: 10/14 jobs passed

View full pipeline

Update from 04df536 to 65bad42 which adds fake tensor implementations
for torch.export (hstu_ops_gpu.py). This was missing since PR NVIDIA#340
accidentally reverted the submodule pointer.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@JacoCheung
Copy link
Copy Markdown
Collaborator Author

/build

@JacoCheung
Copy link
Copy Markdown
Collaborator Author

JacoCheung commented Apr 16, 2026

Pipeline #48669643 -- failed

Job Status Log
pre_check ✅ success view
train_build ✅ success view
inference_build ✅ success view
tritonserver_build ✅ success view
build_whl ❌ failed view
dynamicemb_test_fwd_bwd_8gpus ❌ failed view
dynamicemb_test_load_dump_8gpus ✅ success view
unit_test_1gpu_a100 ❌ failed view
unit_test_1gpu_h100 ❌ failed view
unit_test_4gpu ❌ failed view
unit_test_tp_4gpu ❌ failed view
L20_unit_test_1gpu ✅ success view
inference_unit_test_1gpu ✅ success view
inference_test_1gpu ✅ success view

Result: 8/14 jobs passed

View full pipeline

@JacoCheung
Copy link
Copy Markdown
Collaborator Author

/build

@JacoCheung
Copy link
Copy Markdown
Collaborator Author

JacoCheung commented Apr 16, 2026

Pipeline #48680124 -- failed

Job Status Log
pre_check ✅ success view
train_build ✅ success view
inference_build ✅ success view
tritonserver_build ✅ success view
build_whl ✅ success view
dynamicemb_test_fwd_bwd_8gpus ✅ success view
dynamicemb_test_load_dump_8gpus ✅ success view
unit_test_1gpu_a100 ❌ failed view
unit_test_1gpu_h100 ❌ failed view
unit_test_4gpu ❌ failed view
unit_test_tp_4gpu ❌ failed view
L20_unit_test_1gpu ✅ success view
inference_unit_test_1gpu ✅ success view
inference_test_1gpu ✅ success view

Result: 10/14 jobs passed

View full pipeline

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@JacoCheung
Copy link
Copy Markdown
Collaborator Author

/build

@JacoCheung
Copy link
Copy Markdown
Collaborator Author

JacoCheung commented Apr 16, 2026

Pipeline #48697202 -- failed

Job Status Log
pre_check ✅ success view
train_build ❌ failed view
inference_build ✅ success view
tritonserver_build ✅ success view
build_whl ❌ failed view
dynamicemb_test_fwd_bwd_8gpus ❌ failed view
dynamicemb_test_load_dump_8gpus ❌ failed view
unit_test_1gpu_a100 ❌ failed view
unit_test_1gpu_h100 ❌ failed view
unit_test_4gpu ❌ failed view
unit_test_tp_4gpu ❌ failed view
L20_unit_test_1gpu ❌ failed view
inference_unit_test_1gpu ✅ success view
inference_test_1gpu ✅ success view

Result: 5/14 jobs passed

View full pipeline

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@JacoCheung
Copy link
Copy Markdown
Collaborator Author

/build

@JacoCheung
Copy link
Copy Markdown
Collaborator Author

JacoCheung commented Apr 16, 2026

Pipeline #48700740 -- failed

Job Status Log
pre_check ✅ success view
train_build ✅ success view
inference_build ✅ success view
tritonserver_build ✅ success view
build_whl ✅ success view
dynamicemb_test_fwd_bwd_8gpus ✅ success view
dynamicemb_test_load_dump_8gpus ✅ success view
unit_test_1gpu_a100 ✅ success view
unit_test_1gpu_h100 ❌ failed view
unit_test_4gpu ❌ failed view
unit_test_tp_4gpu ❌ failed view
L20_unit_test_1gpu ✅ success view
inference_unit_test_1gpu ✅ success view
inference_test_1gpu ✅ success view

Result: 11/14 jobs passed

View full pipeline

@shijieliu shijieliu merged commit bb398ee into NVIDIA:main Apr 17, 2026
0 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants