Skip to content

Latest commit

 

History

History
190 lines (136 loc) · 5.91 KB

File metadata and controls

190 lines (136 loc) · 5.91 KB

Google Vertex AI

Access Claude, Gemini, and MaaS models through Google Cloud's Vertex AI platform.

Configuration

Vertex AI uses Google Cloud OAuth2 authentication with service accounts.

Service Account (Recommended)

Environment Variables:

GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
GOOGLE_CLOUD_PROJECT="your-project-id"
GOOGLE_CLOUD_REGION="global"

Provider Options:

ReqLLM.generate_text(
  "google_vertex:claude-sonnet-4-5@20250929",
  "Hello",
  provider_options: [
    service_account_json: "/path/to/service-account.json",
    project_id: "your-project-id",
    region: "global"
  ]
)

Model Specs

For the full model-spec workflow, see Model Specs.

Use exact Vertex model IDs from LLMDB.xyz when possible. For MaaS and other OpenAI-compatible Vertex models that are not in the registry yet, build a full explicit model spec with ReqLLM.model!/1. Some MaaS model IDs also need extra.family when the family cannot be inferred from the ID alone.

Provider Options

Passed via :provider_options keyword:

service_account_json

  • Type: String (file path)
  • Purpose: Path to Google Cloud service account JSON file
  • Fallback: GOOGLE_APPLICATION_CREDENTIALS env var
  • Example: provider_options: [service_account_json: "/path/to/credentials.json"]

access_token

  • Type: String
  • Purpose: Use an existing OAuth2 access token generated outside ReqLLM (e.g., via Goth or gcloud)
  • Behavior: Bypasses the service account JSON flow and internal token management
  • Example: provider_options: [access_token: "your-access-token"]

project_id

  • Type: String
  • Purpose: Google Cloud project ID
  • Fallback: GOOGLE_CLOUD_PROJECT env var
  • Example: provider_options: [project_id: "my-project-123"]
  • Required: Yes

region

  • Type: String
  • Default: "global"
  • Purpose: GCP region for Vertex AI endpoint
  • Example: provider_options: [region: "us-central1"]
  • Note: Use "global" for newest models, specific regions for regional deployment

additional_model_request_fields

  • Type: Map
  • Purpose: Model-specific request fields (e.g., thinking configuration)
  • Example:
    provider_options: [
      additional_model_request_fields: %{
        thinking: %{type: "enabled", budget_tokens: 4096}
      }
    ]

labels

  • Type: Map of strings to strings
  • Purpose: Custom metadata labels attached to the request. Used by Google Cloud for billing and reporting — labels are filterable in billing reports and BigQuery exports.
  • Constraints: Up to 64 labels per request; keys 1–63 chars starting with a lowercase letter; keys and values may only contain lowercase letters, numbers, underscores, and dashes.
  • Availability: Vertex AI only — the direct Gemini API (generativelanguage.googleapis.com) does not support this field.
  • Example:
    provider_options: [
      labels: %{
        "team" => "engineering",
        "environment" => "production",
        "use_case" => "contract_analysis"
      }
    ]
  • Reference: Custom metadata labels

Claude-Specific Options

Vertex AI supports the same Claude options as native Anthropic:

anthropic_top_k

  • Type: 1..40
  • Purpose: Sample from top K options per token
  • Example: provider_options: [anthropic_top_k: 20]

stop_sequences

  • Type: List of strings
  • Purpose: Custom stop sequences
  • Example: provider_options: [stop_sequences: ["END", "STOP"]]

anthropic_metadata

  • Type: Map
  • Purpose: Request metadata for tracking
  • Example: provider_options: [anthropic_metadata: %{user_id: "123"}]

thinking

  • Type: Map
  • Purpose: Enable extended thinking/reasoning
  • Example: provider_options: [thinking: %{type: "enabled", budget_tokens: 4096}]
  • Access: ReqLLM.Response.thinking(response)

anthropic_prompt_cache

  • Type: Boolean
  • Purpose: Enable prompt caching
  • Example: provider_options: [anthropic_prompt_cache: true]

anthropic_prompt_cache_ttl

  • Type: String (e.g., "1h")
  • Purpose: Cache TTL (default ~5min if omitted)
  • Example: provider_options: [anthropic_prompt_cache_ttl: "1h"]

Supported Models

Claude 4.5 Family

  • Haiku 4.5: google_vertex:claude-haiku-4-5@20251001

    • Fast, cost-effective
    • Full tool calling and reasoning support
  • Sonnet 4.5: google_vertex:claude-sonnet-4-5@20250929

    • Balanced performance and capability
    • Extended thinking support
  • Opus 4.1: google_vertex:claude-opus-4-1@20250805

    • Highest capability
    • Advanced reasoning

Claude 4.0 & Earlier

  • Sonnet 4.0: google_vertex:claude-sonnet-4@20250514
  • Opus 4.0: google_vertex:claude-opus-4@20250514
  • Sonnet 3.7: google_vertex:claude-3-7-sonnet@20250219
  • Sonnet 3.5 v2: google_vertex:claude-3-5-sonnet@20241022
  • Haiku 3.5: google_vertex:claude-3-5-haiku@20241022

Model ID Format

Vertex uses the @ symbol for versioning:

  • Format: claude-{tier}-{version}@{date}
  • Example: claude-sonnet-4-5@20250929

Wire Format Notes

  • Authentication: OAuth2 with service account tokens (auto-refreshed)
  • Endpoint: Model-specific paths under aiplatform.googleapis.com
  • API: Uses Anthropic's raw message format (compatible with native API)
  • Streaming: Standard Server-Sent Events (SSE)
  • Region routing: Global endpoint for newest models, regional for specific deployments

All differences handled automatically by ReqLLM.

Resources