Skip to content

[CI] Run test_scheduler on github-hosted with extra disk#279

Merged
YWHyuk merged 1 commit into
developfrom
feature/ci-scheduler-disk
Jun 25, 2026
Merged

[CI] Run test_scheduler on github-hosted with extra disk#279
YWHyuk merged 1 commit into
developfrom
feature/ci-scheduler-disk

Conversation

@YWHyuk

@YWHyuk YWHyuk commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Problem

test_scheduler fails on the github-hosted runner with:

riscv64-unknown-elf/bin/ld: final link failed: No space left on device

The test compiles resnet18 (channels_last) and EncoderBlock(768, 12) and launch_models each model twice. Every unique compiled kernel leaves artifacts behind within the single run:

  • a statically-linked RISC-V ELF (gem5 + spike target, newlib static -> several MB each)
  • binary.dump from objdump -d (full disassembly incl. libc)
  • gem5 m5out dirs and spike arg dumps

resnet18 alone has 20+ conv/bn/relu kernels, so these accumulate past the ~14G github-hosted root volume and the RISC-V link step runs out of space. Single-op tests stay small enough to pass.

Fix

Scope changes to the test_scheduler job only:

  1. jlumbroso/free-disk-space step removes preinstalled tool caches (~25G recovered).
  2. Mount the larger /mnt scratch disk (~70G) over the container's outputs/ and /tmp so accumulated artifacts no longer land on root.

No test semantics or simulator behavior change.

🤖 Generated with Claude Code

test_scheduler compiles resnet18 and EncoderBlock and launches each
model twice, so the per-kernel RISC-V ELFs, objdump disassembly dumps
and gem5 m5out directories accumulate within a single run. On the small
github-hosted root volume (~14G) this overflows during the RISC-V final
link step (ld: final link failed: No space left on device).

Free the preinstalled tool caches before the run and redirect the
PyTorchSim outputs/ and /tmp artifacts onto the larger /mnt scratch
disk (~70G) so the accumulated artifacts no longer fill the root volume.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01HAmdM9BrsTvfi8sZnnfNno
@YWHyuk YWHyuk force-pushed the feature/ci-scheduler-disk branch from 840a049 to b9dad83 Compare June 25, 2026 15:33
@YWHyuk YWHyuk merged commit 13c93a7 into develop Jun 25, 2026
1 check passed
@YWHyuk YWHyuk deleted the feature/ci-scheduler-disk branch June 25, 2026 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant