Skip to content

Latest commit

 

History

History
100 lines (88 loc) · 3.03 KB

File metadata and controls

100 lines (88 loc) · 3.03 KB

MiniCPM-V 4.5 - SGLang Documentation

1. Installing SGLang

Install SGLang from Source Code

git clone https://github.com/sgl-project/sglang.git
cd sglang

pip install --upgrade pip
pip install -e "python[all]"

Installing flashinfer Dependencies

Method 1: pip installation (network speed may be insufficient)

pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/

Method 2: whl file installation

  • Visit: https://flashinfer.ai/whl/cu121/torch2.4/flashinfer/
  • Locate and download the whl file compatible with your server, e.g. flashinfer-0.1.6+cu121torch2.4-cp310-cp310-linux_x86_64.whl
  • Install using pip:
    pip install flashinfer-0.1.6+cu121torch2.4-cp310-cp310-linux_x86_64.whl

For any installation issues, please consult the official installation documentation

2. Launching Inference Service with sglang

By default, it downloads model files from Hugging Face Hub

python -m sglang.launch_server --model-path openbmb/MiniCPM-V-4_5 --port 30000

Alternatively, you can specify a local path after the --model-path parameter

python -m sglang.launch_server --model-path your_model_path --port 30000 --trust-remote-code

3. Service API Calls

  • Bash call

    curl -s http://localhost:30000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "MiniCPM-V-4.5",
        "messages": [
        {
            "role": "user",
            "content": [
            {
                "type": "text",
                "text": "What's in this image?"
            },
            {
                "type": "image_url",
                "image_url": {
                "url": "https://github.com/OpenSQZ/MiniCPM-o-cookbook/blob/main/inference/assets/airplane.jpeg?raw=true"
                }
            }
            ]
        }
        ],
        "max_tokens": 300
    }'
  • Python call

    from openai import OpenAI
    
    client = OpenAI(base_url=f"http://localhost:30000/v1", api_key="None")
    
    response = client.chat.completions.create(
        model="MiniCPM-V-4.5",
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "What is in this image?",
                    },
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": "https://github.com/OpenSQZ/MiniCPM-o-cookbook/blob/main/inference/assets/airplane.jpeg?raw=true",
                        },
                    },
                ],
            }
        ],
        max_tokens=300,
    )
    
    print(response.choices[0].message.content)

If the image_url is inaccessible, it can be replaced with a local image path

For more calling methods, please refer to the SGLang documentation