Fork safety

It looks like deepspeed is not fork-safe:

- During import it checks `builder.is_compatible()` for all ops
- Some ops use `torch.cuda.get_device_properties`
- This initializes the CUDA context

See https://github.com/deepspeedai/DeepSpeed/blob/38bd11a758875cc48b9def31c74464140658ce43/deepspeed/git_version_info.py#L30

When the process then forks to run in parallel any access to CUDA through PyTorch will fail:
> RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method


I'm not sure why this doesn't seem to be fully consistent but I can reproduce it with:
`pytest --forked tests/unit/ops/deepspeed4science/test_DS4Sci_EvoformerAttention.py -k 'test_DS4Sci_EvoformerAttention[tensor_shape1-dtype1]' -s`

Where it will fail when invoking `skip_on_arch(8 if dtype == torch.bfloat16 else 7)` which calls `torch.cuda.get_device_properties`, now in the forked subprocess.


This is an issue in general: Fork-multiprocessing can not be used after importing deepspeed

But it also contradicts the [documentation](https://github.com/deepspeedai/DeepSpeed/blob/master/docs/contributing.md):
> Note that [pytest-forked](https://github.com/pytest-dev/pytest-forked) and the --forked flag are required to test CUDA functionality in distributed tests.

It seems the opposite is true: The flag must not be used.

Or am I missing anything?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fork safety #7918

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fork safety #7918

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions