Skip to content

视觉推理直接传入文件路径出错 #37

@reeered

Description

@reeered

Windows环境下,执行下面的代码

from dashscope import MultiModalConversation

local_path1 = "test_video_frames/frame_0000.jpg"
local_path2 = "test_video_frames/frame_0001.jpg"
local_path3 = "test_video_frames/frame_0002.jpg"
local_path4 = "test_video_frames/frame_0003.jpg"

image_path1 = f"file://{local_path1}"
image_path2 = f"file://{local_path2}"
image_path3 = f"file://{local_path3}"
image_path4 = f"file://{local_path4}"

messages = [{"role": "system",
                "content": [{"text": "You are a helpful assistant."}]},
                {'role':'user',
                # 若模型属于Qwen2.5-VL系列且传入图像列表时,可设置fps参数,表示图像列表是由原视频每隔 1/fps 秒抽取的,其他模型设置则不生效
                'content': [{'video': [image_path1,image_path2,image_path3,image_path4],"fps":2},
                            {'text': '这段视频描绘的是什么景象?'}]}]

response = MultiModalConversation.call(
    api_key=api_key,
    model='qwen-vl-max-latest', 
    messages=messages)

print(response)

输出结果:

{"status_code": 400, "request_id": "948fecd8-3d82-9e8e-a68b-afb9da049b52", "code": "InvalidParameter.DataInspection", "message": "The media format is not supported or incorrect for the data inspection.", "output": null, "usage": null}

此外,如果模型选择qwen-vl-max,输出的response则为:

{"status_code": 400, "request_id": "100e23e9-6fe9-9d70-92b5-12de2fd50793", "code": "InvalidParameter", "message": "<400> InternalError.Algo.InvalidParameter: The provided URL does not appear to be valid. Ensure it is correctly formatted.", "output": null, "usage": null}

改为使用base64编码则无以上问题。但文档中提到”Base64编码会增加数据体积,以文件路径方式传输时,稳定性更高,建议优先使用该方式“,所以希望能够修复此问题。

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions