File tree Expand file tree Collapse file tree
Expand file tree Collapse file tree Original file line number Diff line number Diff line change @@ -252,7 +252,7 @@ The preliminary support for visualization during training process are provided a
252252
253253## MegaDPP
254254
255- ## Environment Configuration
255+ ### Environment Configuration
256256
257257- The following is the pod configuration.
258258
@@ -295,9 +295,9 @@ cd megatron/shm_tensor_new_rdma_pre_alloc
295295pip install -e .
296296```
297297
298- ## Run
298+ ### Run
299299
300- ### Dataset Preparation
300+ #### Dataset Preparation
301301
302302The dataset preparation step follows largely from the Megatron framework.
303303
@@ -340,7 +340,7 @@ python ../tools/preprocess_data.py \
340340
341341For other models, please refer to `nvidia/megatron` for the corresponding datasets.
342342
343- ### Single Node Distributed Training
343+ #### Single Node Distributed Training
344344To run distributed training on a single node, go to the project root directory and run
345345
346346```bash
@@ -379,7 +379,7 @@ bash examples/<model>/<train_file>.sh
379379```
380380or write a file similar to `run_{single,master,worker}_<model>.sh` that sets up configurations and runs the shell under `examples/`
381381
382- ### Multinode Distributed Training
382+ #### Multinode Distributed Training
383383To run distributed training on multiple nodes, go to the root directory. First run
384384
385385```bash
You can’t perform that action at this time.
0 commit comments