perf(cuda/elementwise) by kilinchange · Pull Request #140 · InfiniTensor/InfiniTrain

kilinchange · 2026-04-07T09:52:48Z

pass broadcast strides by value to kill per-call cudaMallocAsync

…all cudaMallocAsync

kilinchange · 2026-04-07T10:48:14Z

性能提升情况：

kilinchange · 2026-04-07T10:49:05Z

精度情况：

gpt2_1_bfloat16 测例的精度波动参考 torch 在接受范围内（torch 精度：step 6/10 | train loss 5.063675，step 7/10 | train loss 4.845804，step 10/10 | train loss 5.221752）
lora 已知存在精度波动

chen2021673 · 2026-04-08T03:09:42Z

infini_train/src/kernels/cuda/elementwise.cu

+// Maximum number of dimensions supported by the broadcast metadata.
+// Real-world tensors in this codebase top out at 4-5 dims, so 8 leaves comfortable headroom
+// while keeping the struct under the 4 KB CUDA kernel parameter limit.
+constexpr int kMaxBroadcastDims = 8;


确实，一般来讲8足够了。之后有超过8的场景还可以feedback到memcpy版本上

chen2021673

LGTM

perf(cuda/elementwise): pass broadcast strides by value to kill per-c…

a994f48

…all cudaMallocAsync

kilinchange changed the title ~~perf(cuda/elementwise): pass broadcast strides by value to kill per-call cudaMallocAsync~~ perf(cuda/elementwise): Apr 7, 2026

kilinchange changed the title ~~perf(cuda/elementwise):~~ perf(cuda/elementwise) Apr 7, 2026

kilinchange requested a review from chen2021673 April 7, 2026 10:58

chen2021673 reviewed Apr 8, 2026

View reviewed changes

chen2021673 approved these changes Apr 8, 2026

View reviewed changes

kilinchange merged commit cfe7bf8 into master Apr 8, 2026
2 checks passed

kilinchange deleted the perf/cuda_elementwise_kernel branch April 8, 2026 03:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(cuda/elementwise)#140

perf(cuda/elementwise)#140
kilinchange merged 1 commit intomasterfrom
perf/cuda_elementwise_kernel

kilinchange commented Apr 7, 2026 •

edited

Loading

Uh oh!

kilinchange commented Apr 7, 2026

Uh oh!

kilinchange commented Apr 7, 2026 •

edited

Loading

Uh oh!

chen2021673 Apr 8, 2026

Uh oh!

chen2021673 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kilinchange commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kilinchange commented Apr 7, 2026

Uh oh!

kilinchange commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chen2021673 Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

chen2021673 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kilinchange commented Apr 7, 2026 •

edited

Loading

kilinchange commented Apr 7, 2026 •

edited

Loading