You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
refactor(pr66-simplify): correct rstd_out semantic name + clarity fixes
Post-merge /simplify review findings applied:
- **`AddRmsNorm` param rename** (`src/base/add_rms_norm.h` + 3 Ascend kernels + test):
`rstd_out` → `residual_out`. The slot actually holds `xOut` (the
`input + other` residual sum) per `aclnnAddRmsNorm`'s API — the internal
`rstd_tensor_` reciprocal-std buffer is private. Prior name was
misleading.
- **Generator shim for `apply_rotary_pos_emb`** (`scripts/generate_wrappers.py`):
rename the `head_size`-as-`rotary_dim` positional forward to a named local
`rotary_dim_shim` + comment noting the legacy shim assumes full rotary
(`rotary_dim == head_size`).
- **`kernel_sincos_cache.h` leak comment**: TODO → FIXME with persistent-worker
impact call-out. Actual fix still blocked on undocumented input-address
index layout for `aclnnRopeWithSinCosCache`.
Skipped findings: reviewer false positives on `src/base/rotary_embedding.h`
members (all consumed by kernels) and `max_seq_len_` (used in constructor
body). Larger refactors (UploadCosSinCache + IndexSelect helpers, ~100
lines copy-paste) deferred to a follow-up PR.
0 commit comments