Skip to content

[Feature] Migrate AICPU launch to new rtsLaunchCpuKernel interface (BUILD_WITH_NEW_CANN) #356

@hw-native-sys-bot

Description

@hw-native-sys-bot

Summary

Migrate the AICPU kernel launch path to use the new rtsLaunchCpuKernel / rtsBinaryLoadFromFile / rtsFuncGetByName API, with a two-layer dispatcher SO architecture that allows different runtimes to load different AICPU kernel SOs at runtime.

Motivation / Use Case

Current state (simpler):

Both a2a3 and a5 platform backends launch AICPU kernels through CANN's built-in libaicpu_extend_kernels.so:

rtAicpuKernelLaunchExWithArgs(
    rtKernelType_t::KERNEL_TYPE_AICPU_KFC, "AST_DYN_AICPU",
    aicpu_num, &rt_args, nullptr, stream, 0);

Problems:

  • The SO name (libaicpu_extend_kernels.so) is hardcoded — only one fixed SO can be loaded
  • Different runtimes cannot load different AICPU kernel implementations at runtime
  • Manual offsetof-based struct packing for kernel/SO name strings
  • Legacy API may be deprecated in future CANN versions

Target architecture:

Two-layer SO dispatch (matching pypto's pypto_aicpu_interface pattern) + new CANN launch API:

  1. Dispatcher SO (outer, fixed) — runs on AICPU, exports:

    • DynTileFwkDispatcherLoad — receives inner SO binary, saves to AICPU filesystem, dlopen + dlsym
    • DynTileFwkDispatcherInit — delegates to inner SO's init
    • DynTileFwkDispatcherRun — delegates to inner SO's run
  2. Runtime SO (inner, replaceable) — different runtimes load different SOs with different names

  3. Host-side LoadAicpuOp — generates JSON descriptor → rtsBinaryLoadFromFilertsFuncGetByNamertsLaunchCpuKernel

Current Progress

Done:

  • Dispatcher SO implemented (src/common/aicpu_dispatcher/)
  • Host-side LoadAicpuOp wrapper implemented (src/common/host/)
  • Build system updated (dispatcher target, RuntimeBinaries, parallel build)
  • DeviceRunner integration (DispatcherLoad → Init → Run three-step launch)

Blocker:

  • rtsBinaryLoadFromFile returns ACL_ERROR_RT_PARAM_INVALID (107000) on CANN 8.5.0 for any input (including CANN's own built-in aicpu_kernel.json). Need to investigate root cause — likely a missing initialization step, environment config, or CANN version requirement.

Scope

File Change
src/common/aicpu_dispatcher/aicpu_dispatcher.{h,cpp,CMakeLists.txt} New: dispatcher SO
src/common/host/load_aicpu_op.{h,cpp} New: host-side new API wrapper
src/a2a3/platform/onboard/host/device_runner.{h,cpp} Replace launch_aicpu_kernel with new API path
src/a5/platform/onboard/host/device_runner.{h,cpp} Same
src/{a2a3,a5}/platform/onboard/host/pto_runtime_c_api.cpp Set dispatcher SO path
src/{a2a3,a5}/platform/onboard/host/CMakeLists.txt Add load_aicpu_op source + rts include path
python/simpler/runtime_compiler.py Add dispatcher build target
simpler_setup/runtime_builder.py Build dispatcher in parallel, dispatcher_path in RuntimeBinaries
examples/scripts/runtime_builder.py Same

Reference

  • pypto dispatcher: framework/src/machine/device/machine_interface/pypto_aicpu_interface.{h,cpp}
  • pypto host-side: framework/src/machine/runtime/load_aicpu_op.{h,cpp}

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions