Skip to content

issue/294: minicpm-sala model#295

Open
Ceng23333 wants to merge 5 commits intomainfrom
minicpm-sala
Open

issue/294: minicpm-sala model#295
Ceng23333 wants to merge 5 commits intomainfrom
minicpm-sala

Conversation

@Ceng23333
Copy link
Copy Markdown
Contributor

@Ceng23333 Ceng23333 requested a review from a team April 8, 2026 08:12
// - HyPE (RoPE on linear layers; NoPE on sparse layers)
class MiniCPMSALAForCausalLM : public InfinilmModel {
public:
MiniCPMSALAForCausalLM(std::shared_ptr<infinilm::config::ModelConfig> model_config,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除接口rank和backend参数

private:
INFINICORE_NN_MODULE(MiniCPMSALAModel, model);
INFINICORE_NN_MODULE(infinilm::layers::linear::ReplicatedLinear, lm_head);
INFINICORE_NN_MODULE(infinicore::nn::Linear, lm_head);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

使用infinilm::layers::linear::ReplicatedLinear, infinicore::nn::Linear不再使用

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

也要修改

std::unique_ptr<cache::CacheConfig> cache_config_;
};

std::shared_ptr<infinilm::config::ModelConfig> create_minicpm_sala_model_config(std::shared_ptr<infinilm::config::ModelConfig> model_config);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

实现这个create_minicpm_sala_model_config函数。

const cache::CacheConfig *MiniCPMSALAForCausalLM::get_cache_config() const {
return cache_config_.get();
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kvcache创建 minicpm_sala_allocate_kv_cache_tensors.cpp文件中


} // namespace infinilm::models::minicpm_sala

namespace {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

添加上模型的注册


class MiniCPMSALADecoderLayer : public infinicore::nn::Module {
public:
MiniCPMSALADecoderLayer(std::shared_ptr<infinilm::config::ModelConfig> model_config,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MiniCPMSALADecoderLayer的移除rank_info和attention_backend参数

std::optional<infinicore::Tensor> cu_seqlens,
std::optional<infinicore::Tensor> block_tables,
std::optional<infinicore::Tensor> slot_mapping) const;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除多余的参数,forward只需要(const infinicore::Tensor &positions,
infinicore::Tensor &hidden_states,
infinicore::Tensor &residual);

std::optional<infinicore::Tensor> block_tables,
std::optional<infinicore::Tensor> slot_mapping) const;

void set_rotary_emb(const std::shared_ptr<infinicore::nn::RoPE> &rotary_emb);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除set_rotary_emb和reset_cache函数

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除set_rotary_emb函数

#include "../../backends/attention_backends.hpp"
#include "../../cache/kv_cache.hpp"
#include "../../config/model_config.hpp"
#include "../../engine/distributed/distributed.hpp"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attention拆成两个类

#include "models_registry.hpp"
#include "llama/llama.hpp"
#include "minicpm_sala/minicpm_sala_for_causal_lm.hpp"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要修改这个文件的任何代码


#include "../global_state/global_state.hpp"
#include "../models/model_factory.hpp"
#include "../models/models_registry.hpp"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

新增模型,不要修改框架层面上的代码。不能修改该文件

const std::string model_type = model_config->get<std::string>("model_type");
const auto &config_map = models::get_model_config_map();
auto it = config_map.find(model_type);
if (it != config_map.end()) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

新增模型,不要修改框架层面上的代码。不能修改该文件


#include <algorithm>
#include <limits>
#include <memory>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

按照新的方式创建kvcache


void MiniCPMSALAModel::reset_cache(const cache::CacheConfig *cache_config) {
if (cache_config == nullptr) {
kv_cache_minicpm4_ = nullptr;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kvcache创建的代码在csrc/models/minicpm_sala/minicpm_sala_allocate_kv_cache_tensors.cpp中

Comment on lines +77 to +78
if (auto static_cfg = dynamic_cast<const cache::StaticKVCacheConfig *>(cache_config)) {
// Allocate separate caches by KV shape to avoid per-layer padding copies.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

按照新的方式创建kvcache

INFINICORE_NN_MODULE_INIT(o_gate, hidden_size_, num_attention_heads * head_dim_,
model_config->get_quantization_method(), use_bias_, dtype, device);
}
void MiniCPMSALAAttention::set_rotary_emb(const std::shared_ptr<infinicore::nn::RoPE> &rotary_emb) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删除set_rotary_emb函数

std::optional<infinicore::Tensor> cu_seqlens,
std::optional<infinicore::Tensor> block_tables,
std::optional<infinicore::Tensor> slot_mapping) const;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删除set_rotary_emb函数

INFINICORE_NN_MODULE_INIT(mlp, model_config, device);
}

void MiniCPMSALADecoderLayer::set_rotary_emb(const std::shared_ptr<infinicore::nn::RoPE> &rotary_emb) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删除set_rotary_emb函数

Comment on lines +41 to +42
void MiniCPMSALADecoderLayer::reset_cache() {
self_attn_->reset_cache();
Copy link
Copy Markdown
Collaborator

@pengcheng888 pengcheng888 Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删除reset_cache函数


auto to_device = [&](const std::optional<infinicore::Tensor> &t)
-> std::optional<infinicore::Tensor> {
return t.has_value() ? t.value()->to(device) : t;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要修改

Signed-off-by: Ceng23333 <441651826@qq.com>
Signed-off-by: Ceng23333 <441651826@qq.com>
Signed-off-by: Ceng23333 <441651826@qq.com>
Signed-off-by: Ceng23333 <441651826@qq.com>
Signed-off-by: Ceng23333 <441651826@qq.com>
void reset_cache(const cache::CacheConfig *cache_config) override;

protected:
const cache::CacheConfig *get_cache_config() const override;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_cache_config()属于 infinimodel的抽象类了,移除具体模型中的get_cache_config函数

INFINICORE_NN_MODULE(MiniCPMSALAModel, model);
INFINICORE_NN_MODULE(infinilm::layers::linear::ReplicatedLinear, lm_head);
INFINICORE_NN_MODULE(infinicore::nn::Linear, lm_head);
std::unique_ptr<cache::CacheConfig> cache_config_;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除cache_config_参数

MiniCPMSALAModel(std::shared_ptr<infinilm::config::ModelConfig> model_config,
const infinicore::Device &device,
engine::distributed::RankInfo rank_info = engine::distributed::RankInfo(),
backends::AttentionBackend attention_backend = backends::AttentionBackend::Default);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除MiniCPMSALAModel的rank_info和attention_backend参数

engine::distributed::RankInfo rank_info = engine::distributed::RankInfo(),
backends::AttentionBackend attention_backend = backends::AttentionBackend::Default);

infinicore::Tensor forward(const infinicore::Tensor &input_ids,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除past_sequence_lengths total_sequence_lengths input_offsets cu_seqlens block_tables slot_mapping 这写参数。 上面是attn_metadata的数据,只要attn计算时用到,不再一层一层的传递。

std::optional<infinicore::Tensor> block_tables,
std::optional<infinicore::Tensor> slot_mapping) const;

void reset_cache(const cache::CacheConfig *cache_config);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reset_cache 属于 CausalLM类,移除。

INFINICORE_NN_MODULE(infinicore::nn::Embedding, embed_tokens);
INFINICORE_NN_MODULE_VEC(MiniCPMSALADecoderLayer, layers);
INFINICORE_NN_MODULE(infinicore::nn::RMSNorm, norm);
INFINICORE_NN_MODULE(infinicore::nn::RoPE, rotary_emb);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除rotary_emb。 infinicore::nn::RoPE的对象在 minicpm_sala_attention类中,通过get_rope创建

infinicore::Tensor forward(const infinicore::Tensor &hidden_states,
const infinicore::Tensor &position_ids,
std::shared_ptr<infinilm::cache::Cache> kv_cache,
std::optional<infinicore::Tensor> past_sequence_lengths,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除forward的这些 attn_metadata参数

std::optional<infinicore::Tensor> slot_mapping) const;

void set_rotary_emb(const std::shared_ptr<infinicore::nn::RoPE> &rotary_emb);
void reset_cache();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除reset_cache

Comment on lines +89 to +90

kv_cache_minicpm4_ = (minicpm4_layer_count > 0)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

根据minicpm_sala_allocate_kv_cache_tensors.cpp文件创建kvcache。 kv_cache_minicpm4_和kv_cache_lightning_两个变量可以合并成一个

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants