You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# always_save_context=False, # Optional, defaults to False
93
93
# write_files_per_rank=1, # Optional, defaults to 1
94
94
# initial_write_buffer_size_bytes=DESIRED_NUM_BYTES, # Optional, defaults to 16 GB
95
+
# use_optimized_save=True, # Optional, defaults to True. Uses the optimized save method to reduce write time.
96
+
# use_cached_ckpt_structure=True, # Optional, defaults to False. Caches the checkpoint structure after identifying 2 consecutive save plan structures that are equal.
95
97
)
96
98
```
97
99
@@ -126,6 +128,7 @@ from ml_flashpoint.adapter.megatron.save_strategies import (
126
128
)
127
129
128
130
# Loading
131
+
import torch.distributed as dist
129
132
from ml_flashpoint.adapter.megatron.load_strategies import MLFlashpointMegatronLoadStrategy
130
133
from ml_flashpoint.checkpoint_object_manager.checkpoint_object_manager import CheckpointObjectManager
131
134
from ml_flashpoint.core.checkpoint_loader import DefaultMLFlashpointCheckpointLoader
# use_cached_ckpt_structure=True, # Optional, defaults to False. Caches the checkpoint structure after identifying 2 consecutive save plan structures that are equal.
# Instantiate the Load Strategy with the dependencies
@@ -229,11 +238,12 @@ Code: See the [`ml_flashpoint.adapter.pytorch`](https://github.com/google/ml-fla
229
238
To use directly with PyTorch DCP, use the provided `StorageWriter` and `StorageReader` implementations.
230
239
You can use whatever `Planner` implementations work for your use case, or resort to the defaults.
231
240
232
-
If your per-rank checkpoint data exceeds the default buffer size (16 GB as of this writing), you can increase it using the optional `initial_buffer_size_bytes` parameter.
241
+
If your per-rank checkpoint data exceeds the default buffer size (16 GB as of this writing), you can increase it using the optional `initial_buffer_size_bytes` parameter.
0 commit comments