Skip to content

[Bug]: head_first parameter incompatible with flash-attn >=0.5.0 #344

@z1ying

Description

@z1ying

Is there an existing issue ? / 是否已有相关的 issue ?

  • I have searched, and there is no existing issue. / 我已经搜索过了,没有相关的 issue。

Describe the bug / 描述这个 bug

I am currently integrating MiniCPM-SALA into vLLM for high-performance inference.

During testing, I found that the function chunk_simple_gla in the Hugging Face modeling code is called with the parameter head_first.

Flash-attn version >= 0.5.0 no longer supports this parameter, which causes errors when using the latest flash-attn for vLLM integration.

To Reproduce / 如何复现

  1. Install flash-attn >=0.5.0.
  2. Load MiniCPM-SALA using HF Transformers with trust_remote_code=True.
  3. During vLLM integration or manual call to chunk_simple_gla, the model triggers an error because head_first is no longer a valid parameter in flash-attn >=0.5.0.

Actual: TypeError: unexpected keyword argument 'head_first'.

Expected behavior / 期望的结果

Would it be possible to update the Hugging Face modeling file to remove the head_first parameter (or make it optional) so the model is compatible with flash-attn >= 0.5.0?

Expected: model runs correctly.

Thank you in advance for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriage

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions