Commit 421f683
committed
feat(maca): add MetaX MACA backend skeleton and minimal kernels
Introduce a MACA (MetaX 沐曦) backend plugged into the DeviceGuardImpl /
kernel dispatcher framework, targeting the minimal kernel set needed to
validate single-card fp32 training (e.g. mnist) end-to-end:
- Build system: USE_MACA / USE_MCCL options, mxcc toolchain override,
mxomp linkage under USE_OMP, .maca kernel library with -x maca, and
backend-exclusive SRC filtering so non-target backends are not pulled in.
- Device enum: add Device::DeviceType::kMACA (kCount bumped to 3),
IsMACA(), and a three-way ToString() switch.
- common/maca: MACA_CHECK / MCBLAS_CHECK / MCCL_CHECK macros and the
kernel_helper.cuh template library (Cast/Neg/Sin/Pow/Add/Sub/Mul/Div/
Max/Min/Fma/fastAtomicAdd) plus a cub_compat.cuh shim pinning CubSumOp/
CubMaxOp/CubMinOp to the pre-2.8 CUB API that MACA ships.
- core/runtime/maca: MacaStream / MacaEvent / MacaBlasHandle derived from
core::Stream / Event / BlasHandle, and MacaGuardImpl mirroring
CudaGuardImpl (mcInit(0) in ctor, call_once'd default stream/handle
caches, full stream/event/sync/blas/memory surface). Mempool watermark
hooks are stubs pending SDK verification.
- datatype.h / tensor.cc / nn/init.cc: add USE_MACA branches to map
kBFLOAT16 / kFLOAT16 to __maca_bfloat16 / __half, specialize the
is_floating_point_ext / is_arithmetic_ext / LargerType traits, route
Fill casts through float under real device backends to dodge the
ambiguous __half(int) constructor on MACA, and wire Arange for bf16/fp16.
- kernels/maca: mechanically port the minimal 5-kernel slice
(elementwise, linear, fill, no_op, accumulate_grad) from their .cu
counterparts, switching blas/stream acquisition to the new
GetDeviceGuardImpl()->GetBlasHandle()/GetStream() idiom.
The MCCL collective backend and the remaining 15 kernels (which are
required for gpt2 / DDP) will land in a follow-up commit.1 parent be8d5a8 commit 421f683
16 files changed
Lines changed: 2936 additions & 5 deletions
File tree
- infini_train
- include
- common/maca
- src
- core/runtime/maca
- kernels/maca
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
3 | 4 | | |
| 5 | + | |
| 6 | + | |
4 | 7 | | |
5 | 8 | | |
6 | | - | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
7 | 26 | | |
8 | 27 | | |
9 | 28 | | |
| |||
31 | 50 | | |
32 | 51 | | |
33 | 52 | | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
34 | 69 | | |
35 | 70 | | |
36 | 71 | | |
| |||
48 | 83 | | |
49 | 84 | | |
50 | 85 | | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
51 | 99 | | |
52 | 100 | | |
53 | 101 | | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
54 | 105 | | |
55 | 106 | | |
56 | 107 | | |
| |||
64 | 115 | | |
65 | 116 | | |
66 | 117 | | |
67 | | - | |
| 118 | + | |
68 | 119 | | |
69 | 120 | | |
70 | 121 | | |
| |||
103 | 154 | | |
104 | 155 | | |
105 | 156 | | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
106 | 197 | | |
107 | 198 | | |
108 | 199 | | |
| |||
133 | 224 | | |
134 | 225 | | |
135 | 226 | | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
136 | 243 | | |
137 | 244 | | |
138 | 245 | | |
| |||
148 | 255 | | |
149 | 256 | | |
150 | 257 | | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
151 | 268 | | |
152 | 269 | | |
153 | 270 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
88 | 88 | | |
89 | 89 | | |
90 | 90 | | |
91 | | - | |
| 91 | + | |
| 92 | + | |
92 | 93 | | |
93 | 94 | | |
94 | 95 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
0 commit comments