Skip to content

feat(ascend): add Ascend framework layer — runtime, type mapping, bui…#46

Open
zhangyue207 wants to merge 7 commits intomasterfrom
feat/ascend-framework
Open

feat(ascend): add Ascend framework layer — runtime, type mapping, bui…#46
zhangyue207 wants to merge 7 commits intomasterfrom
feat/ascend-framework

Conversation

@zhangyue207
Copy link
Copy Markdown
Collaborator

…ld integration

Add Ascend platform scaffolding:

  • device_.h: DeviceEnabled<kAscend> specialization
  • data_type_.h: toAclDtype(), isIntegerDtype()
  • common.h: buildAclTensor() with optional transpose
  • workspace_pool_.h: stream-keyed workspace allocator
  • runtime_.h: Runtime<kAscend> (Malloc, Free, Memcpy, Memset)
  • 5 new operator base classes (AddRmsNorm, FlashAttention, Matmul, ReshapeAndCache, RotaryEmbedding)

Integrate into CMake build system, Python binding generation (stream + optional tensor support), and examples runtime API.

zhangyue added 5 commits April 8, 2026 10:52
…ld integration

Add Ascend platform scaffolding:
- `device_.h`: `DeviceEnabled<kAscend>` specialization
- `data_type_.h`: `toAclDtype()`, `isIntegerDtype()`
- `common.h`: `buildAclTensor()` with optional transpose
- `workspace_pool_.h`: stream-keyed workspace allocator
- `runtime_.h`: `Runtime<kAscend>` (Malloc, Free, Memcpy, Memset)
- 5 new operator base classes (AddRmsNorm, FlashAttention, Matmul,
  ReshapeAndCache, RotaryEmbedding)

Integrate into CMake build system, Python binding generation (stream +
optional tensor support), and examples runtime API.
…emove missing include

- Wrap `aclrtMemcpy` (5-arg) and `aclrtMemset` (4-arg) in lambdas to
  match the generic 4-arg / 3-arg calling convention used by examples.
- Assert `aclrtMalloc` return value in `WorkspacePool::ensure()`.
- Remove `ascend/gemm/kernel.h` include from `runtime_api.h` (file
  does not exist until the kernels commit).
- Add Ascend GEMM specialization using `aclnnAddmm`/`aclnnBaddbmm`.
- Add `get_npu_stream()` helper and NPU device detection in test utils.
- Add `skip_unsupported_dtype` fixture for Ascend in conftest.
- Update `runtime_api.h` with Ascend backend entry.
The `aclrtMalloc` call was the sole expression inside `assert()`, so it
was compiled away in release builds (NDEBUG). This left the workspace
buffer null, causing `aclnnAddmm` to return ACLNN_ERR_PARAM_NULLPTR
(161001) for any operation that requires workspace (e.g. alpha != 1.0).
@zhangyue207 zhangyue207 force-pushed the feat/ascend-framework branch from 440b428 to 21533e3 Compare April 8, 2026 07:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant