Skip to content

[bot] Add Azure AI Inference Python SDK integration for ChatCompletionsClient and EmbeddingsClient instrumentation #481

@braintrust-bot

Description

@braintrust-bot

Summary

The Azure AI Inference Python SDK (azure-ai-inference) is Microsoft's official client for accessing AI models through Azure AI Foundry serverless endpoints, GitHub Models, managed compute endpoints, and the Azure OpenAI Service. It exposes ChatCompletionsClient, EmbeddingsClient, and ImageEmbeddingsClient for executing inference against a broad catalog of models (Meta Llama 3.3, Mistral Large, DeepSeek-R1, Microsoft Phi-4, Cohere Command R, and others) that are hosted on Azure but are not accessible through the standard OpenAI Python client.

This repository has zero instrumentation for any azure-ai-inference execution surface — no integration directory, no wrapper, no patcher, no auto_instrument() support. Users who call ChatCompletionsClient.complete() or EmbeddingsClient.embed() directly get no Braintrust spans.

The SDK cannot be wrapped with wrap_openai() because ChatCompletionsClient is a distinct class with its own authentication (Azure API keys or Entra ID) and its own request/response types from the azure.ai.inference.models namespace. wrap_openai() requires an openai.OpenAI instance.

The Braintrust docs list "Azure AI Foundry" as a supported cloud provider, but this coverage is provided through the AI Proxy gateway (using an OpenAI client pointed at the Braintrust gateway URL), not through native azure-ai-inference SDK tracing. Users who follow Microsoft's official documentation for Azure AI Foundry (pip install azure-ai-inference) and call ChatCompletionsClient.complete() directly get zero Braintrust spans.

What needs to be instrumented

The azure-ai-inference package (v1.0.0b9) exposes these execution surfaces, none of which are instrumented:

Chat completions (highest priority)

SDK Method Description Streaming
ChatCompletionsClient.complete(messages, ...) Chat completions via Azure AI Foundry / GitHub Models No
ChatCompletionsClient.complete(messages, stream=True, ...) Streaming chat completions StreamingChatCompletions iterator
AsyncChatCompletionsClient.complete(...) Async chat completions No
AsyncChatCompletionsClient.complete(..., stream=True) Async streaming chat completions AsyncStreamingChatCompletions

Response shape: ChatCompletions with choices[0].message.content, choices[0].finish_reason, usage.prompt_tokens, usage.completion_tokens, usage.total_tokens, model, id. Mirrors the OpenAI response shape in structure but is a distinct Azure type.

Streaming: StreamingChatCompletions is an iterable of StreamingChatCompletionsUpdate objects with choices[0].delta.content. The integration must accumulate deltas and finalize the span when iteration completes.

Embeddings

SDK Method Description
EmbeddingsClient.embed(input, ...) Generate embeddings for a list of texts
AsyncEmbeddingsClient.embed(input, ...) Async embeddings

Return type: EmbeddingsResult with data[0].embedding (list of floats) and usage.prompt_tokens.

Implementation notes

Authentication: Uses Azure API key (AzureKeyCredential) or Entra ID (DefaultAzureCredential). VCR cassettes need api-key header sanitization.

Endpoint-per-model pattern: Unlike OpenAI where a single client accesses all models, each Azure AI Foundry deployment has its own endpoint URL. The model name is embedded in the endpoint or returned in the response. Span metadata should capture model from ChatCompletions.model.

GitHub Models support: The same ChatCompletionsClient is used for GitHub Models (with endpoint="https://models.inference.ai.azure.com" and a GitHub token). GitHub Models provides free-tier access to GPT-4o, Llama, Mistral, and others for prototyping.

Parameters relevant for span metadata: model (or inferred from endpoint), temperature, max_tokens, top_p, frequency_penalty, presence_penalty, seed, tools, response_format, stop.

No coverage in any instrumentation layer

  • No integration directory (py/src/braintrust/integrations/azure_ai_inference/)
  • No wrapper function (e.g. wrap_azure_ai_inference())
  • No patcher in any existing integration
  • No nox test session (test_azure_ai_inference)
  • No version entry in py/src/braintrust/integrations/versioning.py
  • No mention in py/src/braintrust/integrations/__init__.py

A grep for azure.ai, azure-ai-inference, or azure_ai_inference across py/src/braintrust/ returns zero matches.

Braintrust docs status

unclear — The Braintrust AI providers page lists "Azure AI Foundry" as a supported cloud provider, but the integration is through the AI Proxy gateway (routing openai.AzureOpenAI or openai.OpenAI through the Braintrust gateway URL), not through a native azure-ai-inference SDK wrapper. Users following Microsoft's official azure-ai-inference quickstart docs get zero native Braintrust tracing.

Upstream references

Local repo files inspected

  • py/src/braintrust/integrations/ — no azure_ai_inference/ directory on main
  • py/src/braintrust/wrappers/ — no Azure AI Inference wrapper
  • py/noxfile.py — no test_azure_ai_inference session
  • py/pyproject.toml [tool.braintrust.matrix] — no azure-ai-inference entry
  • py/src/braintrust/integrations/__init__.py — Azure AI Inference not listed
  • py/src/braintrust/integrations/versioning.py — no Azure AI Inference version matrix
  • Full repo grep for azure.ai, azure-ai-inference, azure_ai_inference — zero matches in SDK source

Metadata

Metadata

Assignees

No one assigned
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions