Skip to content

[Bug]: Inference fails on older transformers versions due to DynamicLayer import and incorrect past_key_values initialization #339

@falcon-xu

Description

@falcon-xu

Is there an existing issue ? / 是否已有相关的 issue ?

  • I have searched, and there is no existing issue. / 我已经搜索过了,没有相关的 issue。

Describe the bug / 描述这个 bug

🐛 Bug Description

When running inference on MiniCPM4 using transformers==4.49.0, the model throws an ImportError followed by a ValueError. These errors prevent the model from successfully executing the forward pass in environments that have not upgraded to the latest transformers versions.

🔍 Root Cause Analysis

We identified two distinct but related issues during the model initialization and the first forward pass:

1. DynamicLayer Import Error
DynamicLayer was introduced in transformers version 4.54.1. In older versions (e.g., 4.49.0), importing it directly from transformers.cache_utils causes a fatal crash:

ImportError: cannot import name 'DynamicLayer' from 'transformers.cache_utils' 

2. past_key_values Initialization Logic Flaw

During the first forward pass, past_key_values is naturally None. However, the current logic in MiniCPMModel.forward (around line 1940) misinterprets None as a legacy tuple cache because isinstance(None, Cache) evaluates to False:

ValueError: You must use the new past_key_values format, such as the Cache class, instead of the old tuple format

🛠️ Proposed Solution

To maximize backward compatibility without forcing users to upgrade their transformers package (which might break other dependencies), we propose the following minimal-impact fixes:

  1. Self-Contained Cache Classes:
    • Embed CacheLayerMixin and DynamicLayer definitions directly into modeling_minicpm.py as a fallback. (If use newer versions, it can be neglected)
  2. Refined Cache Check:
    • Update the validation logic in the forward method to explicitly allow past_key_values is None during the first pass and correctly initialize InfLLMv2Cache or DynamicCache.

PR: openbmb/MiniCPM4-8B · Fix: resolve transformers version compatibility for DynamicLayer and cache initialization

To Reproduce / 如何复现

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
    "openbmb/MiniCPM4-8B",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("openbmb/MiniCPM4-8B", trust_remote_code=True)
prompt = "GitHub community standards dictate clear code reproduction."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model(
            input_ids=inputs["input_ids"], 
            attention_mask=inputs["attention_mask"],
            use_cache=True 
        )

Expected behavior / 期望的结果

Normal Inference

Screenshots / 截图

No response

Environment / 环境

- **Model:** MiniCPM4-8B / MiniCPM4-0.5B
- **Transformers Version:** <= 4.54.0 (e.g., 4.49.0)
- **PyTorch Version:** (Add your version here, e.g., 2.2.0)

Additional context / 其他信息

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriage

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions