Skip to content

Dc agent main#6

Draft
tohtana wants to merge 10 commits into
masterfrom
dc_agent_main
Draft

Dc agent main#6
tohtana wants to merge 10 commits into
masterfrom
dc_agent_main

Conversation

@tohtana
Copy link
Copy Markdown
Owner

@tohtana tohtana commented Apr 4, 2026

Dc agent main

tohtana and others added 10 commits April 2, 2026 13:47
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
deepspeedai#7948)

…U accelerator

The on-device flatten path (introduced in deepspeedai#7828) passes nn.Parameter
objects with requires_grad=True to torch.cat(), creating a flat buffer
with CatBackward0 grad_fn. Later, _unflatten_dense_tensors produces
SplitBackward0 views that are assigned to model params. Inplace copy_()
on these views during optimizer step raises:
RuntimeError: Output 0 of SplitBackward0 is a view and is being modified
inplace.

This especially affects CPU training where
CPU_Accelerator.is_available() returns True and available_memory()
returns system RAM, so the on-device path is always taken.

Fix: add .detach() to the flattened buffer, matching the implicit detach
behavior of the CPU-offload path (param.data.cpu() + .to(device)).

Also rename flatten_on_gpu -> flatten_on_accelerator and replace
GPU-specific terminology in comments/logs with accelerator-generic
equivalents.

---------

Signed-off-by: Guokai Ma <guokai.ma@intel.com>
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
(cherry picked from commit c7f1b5cd3f84bf5cdc37a48515eaff5f06580fb4)
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants