issue/294: minicpm-sala model by Ceng23333 · Pull Request #295 · InfiniTensor/InfiniLM

Ceng23333 · 2026-04-08T08:12:00Z

pengcheng888 · 2026-04-08T08:45:27Z

csrc/models/minicpm_sala/minicpm_sala_for_causal_lm.hpp

+// - HyPE (RoPE on linear layers; NoPE on sparse layers)
 class MiniCPMSALAForCausalLM : public InfinilmModel {
 public:
    MiniCPMSALAForCausalLM(std::shared_ptr<infinilm::config::ModelConfig> model_config,


https://github.com/pengcheng888/InfiniLM/blob/main/csrc/models/minicpm_sala/minicpm_sala_for_causal_lm.hpp 请参开接口。移除rank_info和 attention_backend 参数。

移除接口rank和backend参数

pengcheng888 · 2026-04-08T08:46:27Z

csrc/models/minicpm_sala/minicpm_sala_for_causal_lm.hpp

+private:
    INFINICORE_NN_MODULE(MiniCPMSALAModel, model);
-    INFINICORE_NN_MODULE(infinilm::layers::linear::ReplicatedLinear, lm_head);
+    INFINICORE_NN_MODULE(infinicore::nn::Linear, lm_head);


使用infinilm::layers::linear::ReplicatedLinear， infinicore::nn::Linear不再使用

也要修改

pengcheng888 · 2026-04-08T08:47:15Z

csrc/models/minicpm_sala/minicpm_sala_for_causal_lm.hpp

+    std::unique_ptr<cache::CacheConfig> cache_config_;
 };

-std::shared_ptr<infinilm::config::ModelConfig> create_minicpm_sala_model_config(std::shared_ptr<infinilm::config::ModelConfig> model_config);


实现这个create_minicpm_sala_model_config函数。

pengcheng888 · 2026-04-08T08:49:12Z

csrc/models/minicpm_sala/minicpm_sala_for_causal_lm.cpp

+const cache::CacheConfig *MiniCPMSALAForCausalLM::get_cache_config() const {
+    return cache_config_.get();
 }



kvcache创建 minicpm_sala_allocate_kv_cache_tensors.cpp文件中

pengcheng888 · 2026-04-08T08:49:29Z

csrc/models/minicpm_sala/minicpm_sala_for_causal_lm.cpp


 } // namespace infinilm::models::minicpm_sala

-namespace {


添加上模型的注册

pengcheng888 · 2026-04-08T08:50:44Z

csrc/models/minicpm_sala/minicpm_sala_decoder_layer.hpp

+
+class MiniCPMSALADecoderLayer : public infinicore::nn::Module {
+public:
+    MiniCPMSALADecoderLayer(std::shared_ptr<infinilm::config::ModelConfig> model_config,


https://github.com/pengcheng888/InfiniLM/blob/main/csrc/models/minicpm_sala/minicpm_sala_decoderLayer.hpp 参考实现接口，移除多余的参数

MiniCPMSALADecoderLayer的移除rank_info和attention_backend参数

pengcheng888 · 2026-04-08T08:51:08Z

csrc/models/minicpm_sala/minicpm_sala_decoder_layer.hpp

+                               std::optional<infinicore::Tensor> cu_seqlens,
+                               std::optional<infinicore::Tensor> block_tables,
+                               std::optional<infinicore::Tensor> slot_mapping) const;
+


移除多余的参数，forward只需要(const infinicore::Tensor &positions,
infinicore::Tensor &hidden_states,
infinicore::Tensor &residual);

pengcheng888 · 2026-04-08T08:51:31Z

csrc/models/minicpm_sala/minicpm_sala_decoder_layer.hpp

+                               std::optional<infinicore::Tensor> block_tables,
+                               std::optional<infinicore::Tensor> slot_mapping) const;
+
+    void set_rotary_emb(const std::shared_ptr<infinicore::nn::RoPE> &rotary_emb);


移除set_rotary_emb和reset_cache函数

移除set_rotary_emb函数

pengcheng888 · 2026-04-08T08:52:48Z

csrc/models/minicpm_sala/minicpm_sala_attention.hpp

+#include "../../backends/attention_backends.hpp"
+#include "../../cache/kv_cache.hpp"
+#include "../../config/model_config.hpp"
+#include "../../engine/distributed/distributed.hpp"


attention拆成两个类

pengcheng888 · 2026-04-08T08:53:08Z

csrc/models/model_factory.cpp

-#include "models_registry.hpp"
+#include "llama/llama.hpp"
+#include "minicpm_sala/minicpm_sala_for_causal_lm.hpp"



不要修改这个文件的任何代码

pengcheng888 · 2026-04-08T08:53:59Z

csrc/engine/rank_worker.cpp


 #include "../global_state/global_state.hpp"
 #include "../models/model_factory.hpp"
 #include "../models/models_registry.hpp"


新增模型，不要修改框架层面上的代码。不能修改该文件

pengcheng888 · 2026-04-08T08:54:25Z

csrc/config/config_factory.cpp

    const std::string model_type = model_config->get<std::string>("model_type");
    const auto &config_map = models::get_model_config_map();
    auto it = config_map.find(model_type);
    if (it != config_map.end()) {


新增模型，不要修改框架层面上的代码。不能修改该文件

pengcheng888 · 2026-04-08T08:55:58Z

csrc/cache/kv_cache.hpp


 #include <algorithm>
 #include <limits>
 #include <memory>


按照新的方式创建kvcache

pengcheng888 · 2026-04-08T08:56:54Z

csrc/models/minicpm_sala/minicpm_sala_model.cpp

+
+void MiniCPMSALAModel::reset_cache(const cache::CacheConfig *cache_config) {
+    if (cache_config == nullptr) {
+        kv_cache_minicpm4_ = nullptr;


kvcache创建的代码在csrc/models/minicpm_sala/minicpm_sala_allocate_kv_cache_tensors.cpp中

pengcheng888 · 2026-04-08T08:57:48Z

csrc/models/minicpm_sala/minicpm_sala_model.cpp

+    if (auto static_cfg = dynamic_cast<const cache::StaticKVCacheConfig *>(cache_config)) {
+        // Allocate separate caches by KV shape to avoid per-layer padding copies.


按照新的方式创建kvcache

pengcheng888 · 2026-04-08T08:58:54Z

csrc/models/minicpm_sala/minicpm_sala_attention.cpp

-        INFINICORE_NN_MODULE_INIT(o_gate, hidden_size_, num_attention_heads * head_dim_,
-                                  model_config->get_quantization_method(), use_bias_, dtype, device);
-    }
+void MiniCPMSALAAttention::set_rotary_emb(const std::shared_ptr<infinicore::nn::RoPE> &rotary_emb) {


删除set_rotary_emb函数

pengcheng888 · 2026-04-08T08:59:02Z

csrc/models/minicpm_sala/minicpm_sala_attention.hpp

+                               std::optional<infinicore::Tensor> cu_seqlens,
+                               std::optional<infinicore::Tensor> block_tables,
+                               std::optional<infinicore::Tensor> slot_mapping) const;
+


删除set_rotary_emb函数

pengcheng888 · 2026-04-08T08:59:14Z

csrc/models/minicpm_sala/minicpm_sala_decoder_layer.cpp

+    INFINICORE_NN_MODULE_INIT(mlp, model_config, device);
+}
+
+void MiniCPMSALADecoderLayer::set_rotary_emb(const std::shared_ptr<infinicore::nn::RoPE> &rotary_emb) {


删除set_rotary_emb函数

pengcheng888 · 2026-04-08T08:59:29Z

csrc/models/minicpm_sala/minicpm_sala_decoder_layer.cpp

+void MiniCPMSALADecoderLayer::reset_cache() {
+    self_attn_->reset_cache();


删除reset_cache函数

pengcheng888 · 2026-04-08T09:00:55Z

csrc/engine/infer_engine.cpp


    auto to_device = [&](const std::optional<infinicore::Tensor> &t)
        -> std::optional<infinicore::Tensor> {
-        return t.has_value() ? t.value()->to(device) : t;


不要修改

Signed-off-by: Ceng23333 <441651826@qq.com>

pengcheng888 · 2026-04-09T01:53:51Z

csrc/models/minicpm_sala/minicpm_sala_for_causal_lm.hpp

    void reset_cache(const cache::CacheConfig *cache_config) override;

-protected:
+    const cache::CacheConfig *get_cache_config() const override;


get_cache_config()属于 infinimodel的抽象类了，移除具体模型中的get_cache_config函数

pengcheng888 · 2026-04-09T01:55:25Z

csrc/models/minicpm_sala/minicpm_sala_for_causal_lm.hpp

    INFINICORE_NN_MODULE(MiniCPMSALAModel, model);
-    INFINICORE_NN_MODULE(infinilm::layers::linear::ReplicatedLinear, lm_head);
+    INFINICORE_NN_MODULE(infinicore::nn::Linear, lm_head);
+    std::unique_ptr<cache::CacheConfig> cache_config_;


移除cache_config_参数

pengcheng888 · 2026-04-09T01:57:54Z

csrc/models/minicpm_sala/minicpm_sala_model.hpp

+    MiniCPMSALAModel(std::shared_ptr<infinilm::config::ModelConfig> model_config,
+                     const infinicore::Device &device,
+                     engine::distributed::RankInfo rank_info = engine::distributed::RankInfo(),
+                     backends::AttentionBackend attention_backend = backends::AttentionBackend::Default);


移除MiniCPMSALAModel的rank_info和attention_backend参数

pengcheng888 · 2026-04-09T02:00:35Z

csrc/models/minicpm_sala/minicpm_sala_model.hpp

+                     engine::distributed::RankInfo rank_info = engine::distributed::RankInfo(),
+                     backends::AttentionBackend attention_backend = backends::AttentionBackend::Default);
+
+    infinicore::Tensor forward(const infinicore::Tensor &input_ids,


移除past_sequence_lengths total_sequence_lengths input_offsets cu_seqlens block_tables slot_mapping 这写参数。上面是attn_metadata的数据，只要attn计算时用到，不再一层一层的传递。

pengcheng888 · 2026-04-09T02:01:44Z

csrc/models/minicpm_sala/minicpm_sala_model.hpp

+                               std::optional<infinicore::Tensor> block_tables,
+                               std::optional<infinicore::Tensor> slot_mapping) const;
+
+    void reset_cache(const cache::CacheConfig *cache_config);


reset_cache 属于 CausalLM类，移除。

pengcheng888 · 2026-04-09T02:03:33Z

csrc/models/minicpm_sala/minicpm_sala_model.hpp

+    INFINICORE_NN_MODULE(infinicore::nn::Embedding, embed_tokens);
+    INFINICORE_NN_MODULE_VEC(MiniCPMSALADecoderLayer, layers);
+    INFINICORE_NN_MODULE(infinicore::nn::RMSNorm, norm);
+    INFINICORE_NN_MODULE(infinicore::nn::RoPE, rotary_emb);


移除rotary_emb。 infinicore::nn::RoPE的对象在 minicpm_sala_attention类中，通过get_rope创建

pengcheng888 · 2026-04-09T02:05:57Z

csrc/models/minicpm_sala/minicpm_sala_decoder_layer.hpp

+    infinicore::Tensor forward(const infinicore::Tensor &hidden_states,
+                               const infinicore::Tensor &position_ids,
+                               std::shared_ptr<infinilm::cache::Cache> kv_cache,
+                               std::optional<infinicore::Tensor> past_sequence_lengths,


移除forward的这些 attn_metadata参数

pengcheng888 · 2026-04-09T02:06:37Z

csrc/models/minicpm_sala/minicpm_sala_decoder_layer.hpp

+                               std::optional<infinicore::Tensor> slot_mapping) const;
+
+    void set_rotary_emb(const std::shared_ptr<infinicore::nn::RoPE> &rotary_emb);
+    void reset_cache();


移除reset_cache

pengcheng888 · 2026-04-09T02:08:31Z

csrc/models/minicpm_sala/minicpm_sala_model.cpp

+
+        kv_cache_minicpm4_ = (minicpm4_layer_count > 0)


根据minicpm_sala_allocate_kv_cache_tensors.cpp文件创建kvcache。 kv_cache_minicpm4_和kv_cache_lightning_两个变量可以合并成一个

Ceng23333 requested a review from a team April 8, 2026 08:12

pengcheng888 reviewed Apr 8, 2026

View reviewed changes

pengcheng888 requested changes Apr 8, 2026

View reviewed changes

squash for refactor

c8e6f32

Signed-off-by: Ceng23333 <441651826@qq.com>

Ceng23333 force-pushed the minicpm-sala branch from 2c913cb to c8e6f32 Compare April 8, 2026 11:11

Ceng23333 added 4 commits April 9, 2026 01:38

refactor minicpm-sala

d037eec

Signed-off-by: Ceng23333 <441651826@qq.com>

cleanup code

b936149

Signed-off-by: Ceng23333 <441651826@qq.com>

revert server

8f85cb7

Signed-off-by: Ceng23333 <441651826@qq.com>

revert some code

0d98e75

Signed-off-by: Ceng23333 <441651826@qq.com>

pengcheng888 reviewed Apr 9, 2026

View reviewed changes

		if (auto static_cfg = dynamic_cast<const cache::StaticKVCacheConfig *>(cache_config)) {
		// Allocate separate caches by KV shape to avoid per-layer padding copies.

		void MiniCPMSALADecoderLayer::reset_cache() {
		self_attn_->reset_cache();

Conversation

Ceng23333 commented Apr 8, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pengcheng888 Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pengcheng888 Apr 8, 2026 •

edited

Loading