Commit 38ee37b
Edge placement update (#11)
Updated API for NCCL and NVSHMEM backends for DGraph with running OGB and Benchmark code. Incorporates caching to reduce overhead.
In detail:
* Adding Multi-partition function
* Adding preprocessing code
* Add padding functionality to NVSHMEM operations, enabling arbitrary
shaped inputs to the function
* Revamped the Distributed Graph implementation to simplify the code and
remove the old slicing based code
- Also adds documentation
* Updating Graphcast implementation with DGraph distributed
* Incorporate NCCL cache into the backend engine
* Update local tensor getter to use a placement tensor mask
* Add Graphcast distributed trainer
* Fix distributed graph object
* Save intermediate preprocessed file so it can be reused
* Remove unnecessary complexity of file passing and use a single torch tensor
* Graphcast update static graph generator with preprocessing code
* Updated mesh graph placement algorithm
* Fixed scatter test with new API
* Update distributed GCN with edge placement tensor
* Add GatherCacheGenerator and ScatterCacheGenerator
* Add Graphcast preprocessing
* Add static method for mesh partitioning
* Add grid vertex placement logic to MeshGraph
* Add OGBN-products update
* Overly complicated but correct graph data
* Add separated out benchmarking code for small tests
* Fix missing batch dim
* More general fix
* Disabled some incomplete NVSHMEM caching optimizations. Added code to
set the default PyTorch device.
* Add NVSHMEM benchmark code
- Append backend type to output files
- Add sample plot generation code
* Adding torch distributed init with NVSHMEM communicator
* Apply suggestions from code review
* Fixed plotting script to grab the right log files.
* Fix the cached benchmarks
* Apply review suggestions, remove dead code
---------
Co-authored-by: Brian C. Van Essen <vanessen1@llnl.gov>1 parent 1692164 commit 38ee37b
31 files changed
Lines changed: 2730 additions & 517 deletions
File tree
- DGraph
- data
- distributed
- csrc
- include
- mpi
- nccl
- nvshmem
- experiments
- Benchmarks
- GraphCast
- data_utils
- OGB
- tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
15 | | - | |
| 15 | + | |
16 | 16 | | |
17 | | - | |
| 17 | + | |
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| |||
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
| 41 | + | |
| 42 | + | |
41 | 43 | | |
42 | 44 | | |
| 45 | + | |
| 46 | + | |
43 | 47 | | |
44 | 48 | | |
45 | 49 | | |
| |||
65 | 69 | | |
66 | 70 | | |
67 | 71 | | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
68 | 98 | | |
69 | 99 | | |
70 | 100 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
| |||
0 commit comments