LLM Token Counter Using Azure Functions in Azure API Management

Self-hosted or open-source models often lack a way to track usage metrics. This example demonstrates how to count the tokens in a chat completion using Azure Functions and Azure API Management.

Key components:

An Azure Function to count chat completion tokens
A send-request policy in Azure API Management to call the Azure Function
An emit-metric policy in Azure API Management to send the total token count to Azure Application Insights as a custom metric

This example uses the Llama-3-8B model (though it does return token counts in its responses), and the same approach can be applied to other models.

AutoTokenizer

A Hugging Face AutoTokenizer is used to count chat completion tokens. The tokenizer is loaded from the model hub by specifying the model name. If your chosen model is gated, you must request access on its model page first. Llama 3 models are gated, so this is a prerequisite.

Deploy Azure Function

Using a Managed Identity for the Storage Account with Azure Functions is highly recommended: (https://techcommunity.microsoft.com/blog/appsonazureblog/use-managed-identity-instead-of-azurewebjobsstorage-to-connect-a-function-app-to/3657606)

func azure functionapp publish llama-counter-jay

Azure API Management Policy

In order to emit custom metrics to Azure App Insights, we need to configure App Insights as a diagnostic setting in Azure API Management.

The following policy is an example of how to emit metrics to App Insights.

<policies>
    <!-- Throttle, authorize, validate, cache, or transform the requests -->
    <inbound>
        <set-backend-service base-url="https://models.inference.ai.azure.com" />
        <set-header name="Authorization" exists-action="override">
            <value>Bearer {{GITHUB-TOKEN}}</value>
        </set-header>
    </inbound>
    <!-- Control if and how the requests are forwarded to services  -->
    <backend>
        <base />
    </backend>
    <!-- Customize the responses -->
    <outbound>
        <base />
        <send-request mode="new" response-variable-name="usage" timeout="20" ignore-error="true">
            <set-url>https://[Function_URL]/api/tokencounter</set-url>
            <set-method>POST</set-method>
            <set-header name="Content-Type" exists-action="override">
                <value>application/json</value>
            </set-header>
            <set-body>@{
                    return new JObject(
                            new JProperty("RequestBody", context.Request.Body.As<string>(preserveContent: true)),
                            new JProperty("ResponseBody", context.Response.Body.As<string>(preserveContent: true))
                        ).ToString(Newtonsoft.Json.Formatting.None);
                    }</set-body>
        </send-request>
        <set-variable name="totalTokens" value="@{
                var usage = ((IResponse)context.Variables["usage"]).Body.As<JObject>(preserveContent: true).ToString();
                var responseBody = JObject.Parse(usage);
                var totalTokens = responseBody["usage"]["total_tokens"].ToObject<string>();
                return totalTokens.ToString();
        }" />
        <emit-metric name="Total Tokens" value="@(Convert.ToDouble(context.Variables.GetValueOrDefault<string>("totalTokens")))" namespace="apimjayaoai">
            <dimension name="Subscription ID" />
            <dimension name="Client IP" value="@(context.Request.IpAddress)" />
            <dimension name="model" value="Llama-3-8B" />
        </emit-metric>
    </outbound>
    <!-- Handle exceptions and customize error responses  -->
    <on-error>
        <base />
    </on-error>
</policies>

GITHUB-TOKEN: This is the PAT to access GitHub Models. You can create a PAT from your GitHub account.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.vscode		.vscode
images		images
test		test
.funcignore		.funcignore
.gitignore		.gitignore
README.md		README.md
function_app.py		function_app.py
host.json		host.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Token Counter Using Azure Functions in Azure API Management

AutoTokenizer

Deploy Azure Function

Azure API Management Policy

Metrics on App Insights

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Token Counter Using Azure Functions in Azure API Management

AutoTokenizer

Deploy Azure Function

Azure API Management Policy

Metrics on App Insights

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages