-
Notifications
You must be signed in to change notification settings - Fork 1
docs: add AGENTS.md
#45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
voltjia
wants to merge
2
commits into
master
Choose a base branch
from
chore/add-agents-md
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,103 @@ | ||
| # AGENTS.md | ||
|
|
||
| ## Project Overview | ||
|
|
||
| InfiniOps is a cross-platform operator library supporting multiple backends: **CPU**, **Nvidia**, **MetaX**, **Iluvatar**, **Moore**, **Cambricon**, and more. The codebase uses a template-based architecture where each platform provides its own specialization. | ||
|
|
||
| ## Build & Test | ||
|
|
||
| ```bash | ||
| python -m pip install .[dev] | ||
| python -m pytest -n 1 | ||
| ``` | ||
|
|
||
| To run CI across all supported platforms on remote machines: | ||
|
|
||
| ```bash | ||
| python .ci/agent.py run --branch <branch-name> | ||
| ``` | ||
|
|
||
| To run local code on the current machine: | ||
|
|
||
| ```bash | ||
| python .ci/run.py --local | ||
| ``` | ||
|
|
||
| CI must pass on **all** platforms. Build failures are blockers; test failures may be pre-existing. | ||
|
|
||
| ## Code Architecture | ||
|
|
||
| ### `Runtime<Device::Type>` Template | ||
|
|
||
| `Runtime<Device::Type>` is the unified runtime abstraction for all platforms. Each specialization lives in `<platform>/runtime_.h` and wraps the platform's native APIs with a consistent interface: | ||
|
|
||
| - `Stream` — type alias for the platform's stream/queue type (e.g. `cudaStream_t`, `cnrtQueue_t`). | ||
| - `kDeviceType` — `static constexpr Device::Type` identifying the platform. | ||
| - `Malloc`, `Free`, `Memcpy`, `MemcpyHostToDevice` — thin wrappers around the platform's runtime APIs. | ||
|
|
||
| **`Runtime`, `Blas`, and similar classes/structs must only contain direct bindings to platform APIs** (e.g. what you find in `cuda_runtime.h` or `cublas_v2.h`). Helpers like `GetOptimalBlockSize`, `GetDataType`, or `GetComputeType` must live in separate classes — they are derived logic, not API bindings. | ||
|
|
||
| ### CRTP Interface Enforcement | ||
|
|
||
| `Runtime` specializations inherit from a CRTP base to declare their interface level, then call `static_assert(Runtime<...>::Validate())` after the struct is complete. This catches signature mismatches at compile time, analogous to the `override` keyword. | ||
|
|
||
| - `RuntimeBase<Derived>` — requires `kDeviceType` only (e.g. CPU). | ||
| - `DeviceRuntime<Derived>` — additionally requires `Stream` and more. | ||
|
|
||
| ### Device-Aware Type System | ||
|
|
||
| The type system is split across several layers: | ||
|
|
||
| - **`data_type.h`** — declares `TypeMap<dev, dtype>`, `IsFP16<kDev, T>`, `IsBFloat16<kDev, T>`, and common type lists. | ||
| - **`<platform>/data_type_.h`** — provides `TypeMap` specializations mapping `kFloat16`/`kBFloat16` to platform-specific C++ types (e.g. `half`, `__nv_bfloat16`). Also defines platform-specific type aliases like `cuda_bfloat16`. | ||
| - **`caster.h`** — forward-declares `Caster<Device::Type>`. | ||
| - **`<platform>/caster.cuh`** — specializes `Caster` for each platform. CUDA-like platforms inherit from `CudaCasterImpl` in `cuda/caster.cuh`. | ||
| - **`dispatcher.h`** — `DispatchFunc` for device-aware runtime dispatch over type and block-size lists. | ||
|
|
||
| Use `IsFP16<kDev, T>` / `IsBFloat16<kDev, T>` instead of `std::is_same_v<T, half>` in shared headers. Use `Caster<kDev>::template Cast<T>()` for type conversions. | ||
|
|
||
| ### Include-Order Independence | ||
|
|
||
| Shared headers under `src/cuda/` must **not** reference platform-specific types or intrinsics directly (e.g. `half`, `__float2half`, `cuda_bfloat16`). Instead, use dependent template expressions like `Caster<kDev>::template Cast<T>()` so that name lookup is deferred to instantiation time (C++ two-phase lookup). This ensures shared headers are self-contained and include-order-independent. | ||
|
|
||
| ### Platform-Specific Directory Structure | ||
|
|
||
| Platform-specific code **must** stay in its own directory under `src/<platform>/`. Do **not** lift platform-specific files (e.g. `device_.h`, casters, type mappings) into a common directory. The key architectural principle of the device-aware refactoring is separation into platform-specific directories. | ||
|
|
||
| ### Moore Polyfills | ||
|
|
||
| Moore (MUSA) lacks some CUDA intrinsics. `src/moore/polyfills.cuh` provides replacements (e.g. `hrcp`, `__hadd`). This file must include `<musa_fp16.h>` and `<musa_bf16.h>` **before** defining polyfill macros, to prevent macro collisions with system headers. | ||
|
|
||
| ## Code Style | ||
|
|
||
| See [CONTRIBUTING.md](CONTRIBUTING.md) for the full style guide. Key rules for agents: | ||
|
|
||
| - Follow the [Google C++ Style Guide](https://google.github.io/styleguide/cppguide.html) strictly. Use the default `.clang-format`. | ||
| - Keep changes minimal — do not add what is not necessary. | ||
| - Do not use exceptions. Use `assert` for error handling. | ||
| - Use absolute includes (e.g. `#include "cambricon/common.h"`) instead of relative includes. | ||
| - Include guards should match the filename exactly, including underscores (e.g. `device_.h` -> `INFINI_OPS_<PLATFORM>_DEVICE__H_` with double underscore). | ||
| - Wrap platform includes that need ordering protection with `// clang-format off` / `// clang-format on`. | ||
| - Leave a blank line between consecutive `using` type alias declarations. | ||
| - One blank line between classes, functions, and members within a class. | ||
| - Comments and error messages must be in English, use complete sentences (capitalized, ending with punctuation), and use Markdown backticks around identifiers. | ||
| - Error messages follow [LLVM Coding Standards](https://llvm.org/docs/CodingStandards.html#error-and-warning-messages). | ||
| - Kernel file naming: `kernel` for custom kernels, `kernel_v2`/`kernel_v3` for variants, algorithm name for well-known algorithms (e.g. `flash_attention_v2`), library name for library-based implementations (e.g. `blas`). | ||
| - Separate kernel (`.cuh`) from kernel launcher (`.h`). | ||
|
|
||
| ## Commit & PR Conventions | ||
|
|
||
| See [CONTRIBUTING.md](CONTRIBUTING.md) for full details. | ||
|
|
||
| - Commit messages follow [Conventional Commits](https://www.conventionalcommits.org/). | ||
| - Concise single-line commit messages with Markdown backticks around identifiers. | ||
| - Each commit should do a single job — keep commits logically separated. | ||
| - PR titles follow the same Conventional Commits format. | ||
| - Small PRs should be squashed; large PRs may keep multiple commits if each is meaningful. | ||
| - Branch names: `<type>/xxx-yyyy-zzzz` (hyphen-separated, type matches PR title). | ||
| - Do **not** include AI as a co-author. | ||
|
|
||
| ## Known Pitfalls | ||
|
|
||
| - **Non-dependent name lookup in NVCC**: Direct CUDA intrinsics like `__float2half()` are non-dependent names resolved at template definition time, before platform headers may be available. Wrap them in `Caster<kDev>` to make them dependent. | ||
| - **Moore `hrcp` macro**: The `#define hrcp infini::ops::hrcp` in `polyfills.cuh` will collide with `musa_fp16_mtgpu.h` if the MUSA FP16 header is included after the macro definition. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以加一个说明,run on remote machine
另外加个说明
run local code in current machine
python .ci/run.py --local
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改。