The host_build_graph runtime builds a static task graph on the host, copies the Runtime object to device memory, and lets AICPU scheduler threads dispatch tasks to AICore via a per-core handshake. Dependencies are explicit edges created by orchestration code, so scheduling is a standard fanin/fanout ready-queue model.
Runtimeowns the task table, handshake buffers, and host-side device APIs. Seesrc/runtime/host_build_graph/runtime/runtime.h.Taskis a fixed-size record that storesfunc_id, argument array,fanin,fanout,core_type, andfunction_bin_addr.Handshakeis the shared per-core control block used by AICPU and AICore for dispatch and completion.HostApiprovides device memory ops used by host orchestration (device_malloc,copy_to_device,upload_kernel_binary, etc.).
- Python tooling compiles kernels and orchestration into shared objects.
init_runtime_implloads the orchestration SO from bytes, resolves the entry symbol, and registers kernel binaries with the platform uploader. The resulting GM addresses are stored byRuntime::set_function_bin_addr. Seesrc/runtime/host_build_graph/host/runtime_maker.cpp.- The orchestration function runs on the host and builds the graph. It allocates device buffers, copies input data to device, records output buffers with
record_tensor_pair(runtime, ...), adds tasks viaadd_task(runtime, ...), and adds dependency edges viaadd_successor(runtime, ...). - The populated
Runtimeis copied to device memory by the platform layer. AICPU then runs the executor with this Runtime snapshot.
aicpu_executor.cppperforms core discovery, handshake initialization, and ready-queue seeding usingRuntime::get_initial_ready_tasks.- Scheduler threads maintain per-core and global ready queues. When a task is ready, the scheduler writes its pointer to the core's
Handshakeand setstask_status=1. - AICore reads the handshake, executes the kernel at
Task::function_bin_addr, and writestask_status=0on completion. - AICPU observes completion, resolves dependencies by decrementing fanin, and enqueues newly-ready tasks.
- The executor shuts down cores by setting
Handshake::control=1after all tasks complete.
validate_runtime_impl copies all recorded output tensors back to the host and frees device allocations recorded in tensor pairs. See src/runtime/host_build_graph/host/runtime_maker.cpp.
src/runtime/host_build_graph/runtime/runtime.hsrc/runtime/host_build_graph/runtime/runtime.cppsrc/runtime/host_build_graph/host/runtime_maker.cppsrc/runtime/host_build_graph/aicpu/aicpu_executor.cpp