Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
aa3a357
fix(kubeflow): stream only rank 0 + last rank, write all ranks to disk
ko3n1g May 30, 2026
8e1930e
fix(kubeflow): resolve last pod via completion-index label + full-his…
ko3n1g May 30, 2026
b2be85c
fix(kubeflow): forward by global rank (node_rank*nproc+local), not po…
ko3n1g May 30, 2026
56bbb4b
fix(kubeflow): make TrainJob launch idempotent on 409 conflict
ko3n1g May 30, 2026
3f0f5b4
fix(kubeflow): reload kube client across cert rotation for long runs
ko3n1g May 30, 2026
b597e6c
fix(kubeflow): scope code_dir per job to avoid concurrent clobber
ko3n1g May 30, 2026
636ec99
fix(kubeflow): unique TrainJob name + forward all ranks (deduped)
ko3n1g May 30, 2026
7af08d2
fix(kubeflow): stream logs once, not per replica
ko3n1g May 31, 2026
22168ed
fix(kubeflow): forward rank 0 + last rank to stdout (not all-ranks de…
ko3n1g May 31, 2026
c0d800d
fix(kubeflow): forward rank-0 + the actual loss-rank slot to stdout
ko3n1g May 31, 2026
e8a64f5
fix(kubeflow): robust log streaming across pod/container restarts
ko3n1g May 31, 2026
ec27ed5
fix(kubeflow): resolve rank-0/last pods from worker GROUP_RANK, not a…
ko3n1g May 31, 2026
213ba39
fix(kubeflow): emit forwarded log lines in timestamp order
ko3n1g May 31, 2026
981e6f9
feat(kubeflow): support pod-template annotations/labels (podTemplateO…
ko3n1g May 31, 2026
c23cecf
fix(kubeflow): resolve rank-0 and last rank before forwarding logs
ko3n1g May 31, 2026
2b344dc
fix(kubeflow): wait for rank-0/last to resolve, never fall back to co…
ko3n1g Jun 1, 2026
2870f12
style(kubeflow): ruff-format kubeflow.py
ko3n1g Jun 1, 2026
b6c3d8f
test(kubeflow): update stale tests for uuid names, idempotent 409, ra…
ko3n1g Jun 1, 2026
4e9346f
test(kubeflow): cover GROUP_RANK resolution, log forwarding, client r…
ko3n1g Jun 1, 2026
71461c5
feat(kubeflow): add copy_to_workspace/copy_from_workspace for arbitra…
ko3n1g Jun 2, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Loading