Skip to content

[SOT] add direct kernel fast runtime#79043

Open
SigureMo wants to merge 1 commit into
PaddlePaddle:developfrom
cattidea:codex/sot-python-runtime
Open

[SOT] add direct kernel fast runtime#79043
SigureMo wants to merge 1 commit into
PaddlePaddle:developfrom
cattidea:codex/sot-python-runtime

Conversation

@SigureMo
Copy link
Copy Markdown
Member

@SigureMo SigureMo commented May 18, 2026

PR Category

Performance Optimization

PR Types

New features, Performance

Description

本 PR 为 SOT 动转静执行链路增加 fast kernel runtime,用于压缩运行时调度开销。核心目标是绕过原 C++ program executor 的逐 op 调度路径,在 PIR lower 到 pd_kernel dialect 后直接 codegen Python 可执行函数,并通过自动生成的 core.eager.kernel_ops pybind API 调用底层 kernel。

主要改动:

  • 新增 pir_fast_runtime.py,从 lower 后的 pd_kernel/CINN IR 生成 Python runtime code。
  • SOT PartialProgramLayer.sot_call 增加可开关的 fast kernel runtime 路径。
  • 新增 direct kernel pybind API 生成与 metadata 查询能力,覆盖普通 PHI kernel 调用。
  • 新增 CINN JIT kernel launch 的 Python 入口,用于 GPU 下 lower 后的 CINN kernel 调用。
  • feed/parameter 直接绑定到 Python locals,不额外创建 Scope。
  • 对 unsupported op 直接报错,不 fallback 到 run_program,避免隐藏问题。

验证

  • prek --files python/paddle/jit/dy2static/pir_fast_runtime.py test/sot/test_fast_kernel_runtime.py
  • python -m unittest test_fast_kernel_runtime
  • ctest -R sot --output-on-failure
  • 远端 GPU build 完成,并通过 core.eager.kernel_ops.add smoke test 与 test_fast_kernel_runtime
  • GitHub CI 中 PR-CI-SOT / Build and Test 已通过。

性能结果

以下 benchmark 均排除首次 codegen/CINN compile 时间。

GPU:

  • single_add:sync 22.866us -> 19.605us,enqueue 18.336us -> 14.357us。
  • reshape_add:sync 60.321us -> 53.518us,enqueue 55.852us -> 48.256us。
  • resnet18_b1:sync 0.930ms -> 0.800ms,enqueue 0.824ms -> 0.739ms。
  • resnet18_b10:sync 0.999ms -> 0.855ms,enqueue 0.912ms -> 0.813ms。

CPU:

  • single_add:3.589us -> 1.501us。
  • reshape_add:18.617us -> 5.620us。
  • resnet18_b1:8.627ms -> 8.096ms。
  • resnet18_b10:78.468ms -> 76.781ms。

是否引起精度变化

否。该 PR 仅调整 SOT 编译图运行时调度路径,kernel 输入输出保持与 lower 后 IR 一致;新增单测覆盖了 direct kernel、CINN kernel、BatchNorm eval 和禁止 run_program fallback 的行为。

Copilot AI review requested due to automatic review settings May 18, 2026 16:30
@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 18, 2026

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a "direct kernel fast runtime" path for SOT (Symbolic OpTracing) in Paddle. When the new SOT_ENABLE_FAST_KERNEL_CODEGEN environment variable is enabled, sot_call lowers the PIR program to the pd_kernel dialect and code-generates a Python function that directly invokes per-op pybind kernels (newly exposed under core.eager.kernel_ops), bypassing the run_program executor. The path intentionally does not fall back to run_program and raises a RuntimeError if autograd is active or required metadata is missing.

Changes:

  • Adds a codegen mode (--direct_kernel) to python_c_gen.py plus new CMake wiring that produces kernel_op_function.{h,cc} exposing direct kernel APIs and a get_kernel_ops_args_info registry under core.eager.kernel_ops.
  • Adds manual pybind helpers in kernel_op_function_manual.{h,cc} (incl. get_phi_kernel_op_info and a run_cinn_jit_kernel launcher), exposes apply_pd_op_to_kernel_pass in pir.cc, and binds the new submodule from eager.cc.
  • Adds pir_fast_runtime.py implementing the lowering + Python source codegen, plumbs it into PirPartialProgram.sot_call via a cached FastKernelRuntime, and adds unit tests in test/sot/test_fast_kernel_runtime.py.

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated no comments.

Show a summary per file
File Description
python/paddle/jit/sot/utils/envs.py Adds SOT_ENABLE_FAST_KERNEL_CODEGEN boolean env flag.
python/paddle/jit/dy2static/pir_partial_program.py Routes sot_call to the fast kernel path when enabled; adds runtime cache and quick_index_map handling for Value raw inputs.
python/paddle/jit/dy2static/pir_fast_runtime.py New module that lowers PIR to pd_kernel and codegens a Python kernel-dispatch function with constant folding and CINN jit_kernel support.
paddle/fluid/pybind/pir.cc Exposes apply_pd_op_to_kernel_pass to Python.
paddle/fluid/pybind/eager.{h,cc} Declares and creates the core.eager.kernel_ops submodule and binds both generated and manual direct-kernel functions.
paddle/fluid/pybind/kernel_op_function_manual.{h,cc} Implements get_phi_kernel_op_info and a CINN jit_kernel launcher used by the fast runtime.
paddle/fluid/pybind/CMakeLists.txt, .gitignore, paddle/fluid/pir/dialect/CMakeLists.txt Build wiring for generated kernel_op_function.* files.
paddle/fluid/eager/auto_code_generator/generator/python_c_gen.py Extends the Python-C generator with a --direct_kernel mode, custom namespaces, args-info registry, and configurable bind/method/error names.
test/sot/test_fast_kernel_runtime.py Unit tests covering pybind surface, fast kernel codegen for add/reshape/BN, and the no-fallback contract.
.gitignore Ignores generated kernel_op_function.* outputs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Codex <noreply@openai.com>
@SigureMo SigureMo force-pushed the codex/sot-python-runtime branch from e40ec78 to c06fe94 Compare May 18, 2026 19:21
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 64.37346% with 145 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@3fc714d). Learn more about missing BASE report.

Files with missing lines Patch % Lines
python/paddle/jit/dy2static/pir_fast_runtime.py 62.71% 132 Missing ⚠️
python/paddle/jit/dy2static/pir_partial_program.py 75.00% 13 Missing ⚠️

❌ Your patch status has failed because the patch coverage (64.37%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop   #79043   +/-   ##
==========================================
  Coverage           ?   64.37%           
==========================================
  Files              ?        3           
  Lines              ?      407           
  Branches           ?        0           
==========================================
  Hits               ?      262           
  Misses             ?      145           
  Partials           ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants