Summary
Migrate the AICPU kernel launch path to use the new rtsLaunchCpuKernel / rtsBinaryLoadFromFile / rtsFuncGetByName API, with a two-layer dispatcher SO architecture that allows different runtimes to load different AICPU kernel SOs at runtime.
Motivation / Use Case
Current state (simpler):
Both a2a3 and a5 platform backends launch AICPU kernels through CANN's built-in libaicpu_extend_kernels.so:
rtAicpuKernelLaunchExWithArgs(
rtKernelType_t::KERNEL_TYPE_AICPU_KFC, "AST_DYN_AICPU",
aicpu_num, &rt_args, nullptr, stream, 0);
Problems:
- The SO name (
libaicpu_extend_kernels.so) is hardcoded — only one fixed SO can be loaded
- Different runtimes cannot load different AICPU kernel implementations at runtime
- Manual
offsetof-based struct packing for kernel/SO name strings
- Legacy API may be deprecated in future CANN versions
Target architecture:
Two-layer SO dispatch (matching pypto's pypto_aicpu_interface pattern) + new CANN launch API:
-
Dispatcher SO (outer, fixed) — runs on AICPU, exports:
DynTileFwkDispatcherLoad — receives inner SO binary, saves to AICPU filesystem, dlopen + dlsym
DynTileFwkDispatcherInit — delegates to inner SO's init
DynTileFwkDispatcherRun — delegates to inner SO's run
-
Runtime SO (inner, replaceable) — different runtimes load different SOs with different names
-
Host-side LoadAicpuOp — generates JSON descriptor → rtsBinaryLoadFromFile → rtsFuncGetByName → rtsLaunchCpuKernel
Current Progress
Done:
Blocker:
Scope
| File |
Change |
src/common/aicpu_dispatcher/aicpu_dispatcher.{h,cpp,CMakeLists.txt} |
New: dispatcher SO |
src/common/host/load_aicpu_op.{h,cpp} |
New: host-side new API wrapper |
src/a2a3/platform/onboard/host/device_runner.{h,cpp} |
Replace launch_aicpu_kernel with new API path |
src/a5/platform/onboard/host/device_runner.{h,cpp} |
Same |
src/{a2a3,a5}/platform/onboard/host/pto_runtime_c_api.cpp |
Set dispatcher SO path |
src/{a2a3,a5}/platform/onboard/host/CMakeLists.txt |
Add load_aicpu_op source + rts include path |
python/simpler/runtime_compiler.py |
Add dispatcher build target |
simpler_setup/runtime_builder.py |
Build dispatcher in parallel, dispatcher_path in RuntimeBinaries |
examples/scripts/runtime_builder.py |
Same |
Reference
- pypto dispatcher:
framework/src/machine/device/machine_interface/pypto_aicpu_interface.{h,cpp}
- pypto host-side:
framework/src/machine/runtime/load_aicpu_op.{h,cpp}
Summary
Migrate the AICPU kernel launch path to use the new
rtsLaunchCpuKernel/rtsBinaryLoadFromFile/rtsFuncGetByNameAPI, with a two-layer dispatcher SO architecture that allows different runtimes to load different AICPU kernel SOs at runtime.Motivation / Use Case
Current state (simpler):
Both
a2a3anda5platform backends launch AICPU kernels through CANN's built-inlibaicpu_extend_kernels.so:Problems:
libaicpu_extend_kernels.so) is hardcoded — only one fixed SO can be loadedoffsetof-based struct packing for kernel/SO name stringsTarget architecture:
Two-layer SO dispatch (matching pypto's
pypto_aicpu_interfacepattern) + new CANN launch API:Dispatcher SO (outer, fixed) — runs on AICPU, exports:
DynTileFwkDispatcherLoad— receives inner SO binary, saves to AICPU filesystem,dlopen+dlsymDynTileFwkDispatcherInit— delegates to inner SO's initDynTileFwkDispatcherRun— delegates to inner SO's runRuntime SO (inner, replaceable) — different runtimes load different SOs with different names
Host-side
LoadAicpuOp— generates JSON descriptor →rtsBinaryLoadFromFile→rtsFuncGetByName→rtsLaunchCpuKernelCurrent Progress
Done:
src/common/aicpu_dispatcher/)LoadAicpuOpwrapper implemented (src/common/host/)Blocker:
rtsBinaryLoadFromFilereturnsACL_ERROR_RT_PARAM_INVALID(107000) on CANN 8.5.0 for any input (including CANN's own built-inaicpu_kernel.json). Need to investigate root cause — likely a missing initialization step, environment config, or CANN version requirement.Scope
src/common/aicpu_dispatcher/aicpu_dispatcher.{h,cpp,CMakeLists.txt}src/common/host/load_aicpu_op.{h,cpp}src/a2a3/platform/onboard/host/device_runner.{h,cpp}launch_aicpu_kernelwith new API pathsrc/a5/platform/onboard/host/device_runner.{h,cpp}src/{a2a3,a5}/platform/onboard/host/pto_runtime_c_api.cppsrc/{a2a3,a5}/platform/onboard/host/CMakeLists.txtpython/simpler/runtime_compiler.pysimpler_setup/runtime_builder.pydispatcher_pathin RuntimeBinariesexamples/scripts/runtime_builder.pyReference
framework/src/machine/device/machine_interface/pypto_aicpu_interface.{h,cpp}framework/src/machine/runtime/load_aicpu_op.{h,cpp}