feat: LE-374 token usage tracking for LLM and Agent components (#11891)

viktoravelino · autofix-ci[bot] · Adam-Aghili · commit e4e56f9cc443 · 2026-04-15T12:30:22.000-04:00
* feat: add token usage tracking for LLM and Agent components Track input/output/total tokens across LLM providers (OpenAI, Anthropic, Ollama) and display them on both node badges and chat messages. Backend: thread-safe callback handler for agent token accumulation, usage_metadata extraction for Ollama/LangChain standard, pipeline integration from component through vertex to API response. Frontend: token count formatting utility, Coins icon badge on nodes with tooltip breakdown, chat message status with token display. * feat: accumulate token usage across serial LLMs on chat messages Add upstream token usage accumulation so chat messages display the total tokens from all LLMs in the pipeline, not just the last one. Output vertex node badges hide token counts since the accumulated total is shown on the chat message instead. * chore: add CLAUDE.local.md to .gitignore * chore: update starter project templates for token usage tracking * fix: enable token usage tracking for streaming LLM responses Enable stream_usage=True on OpenAI and Anthropic model constructors so the API includes token counts in streaming chunks. Fix _handle_stream to propagate the AIMessage back to _get_chat_result when not connected to a chat output, so usage can be extracted from the invoke fallback path. Accumulate usage across multiple streaming chunks instead of overwriting, since Anthropic splits input/output tokens across separate events. * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * refactor: centralize token usage extraction into shared module Extract duplicated token usage logic from Component, LCModelComponent, TokenUsageCallbackHandler, and Vertex into a shared lfx.schema.token_usage module. Replace loose dict typing with the existing Usage Pydantic model throughout the token tracking pipeline. Declare _token_usage on Component __init__ instead of dynamically injecting it. * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * [autofix.ci] apply automated fixes * feat: add validation for token_usage field in ResultDataResponse * feat: enable stream_usage in OpenAI model tests * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes * refactor: consolidate token usage extraction into single source of truth Eliminate ~75 lines of duplicated LLMResult token extraction logic between the token usage feature (TokenUsageCallbackHandler) and the traces feature (NativeCallbackHandler) by adding a shared extract_usage_from_llm_result() function. Also fix missing usage property mapping in chat history hook so token counts display correctly in playground messages. * feat: add token usage tracking to all LLM components Add token usage extraction to the 7 remaining components that make LLM calls but weren't tracking token consumption: - Smart Router: direct extract_usage_from_message after invoke - Guardrails: accumulate_usage across multiple guardrail checks - Batch Run: accumulate_usage across batch responses - Smart Transform: extract after ainvoke (already done in prior commit) - Structured Output: via token_usage_callback on get_chat_result - LLM Selector: direct extract for judge + callback for selected model - NotDiamond: via token_usage_callback on get_chat_result Also adds a backward-compatible token_usage_callback parameter to get_chat_result() so components using that shared helper can capture the AIMessage before it's reduced to .content. * [autofix.ci] apply automated fixes * fix: update mock_get_chat_result signatures to accept token_usage_callback The structured output test mocks define explicit parameter lists for get_chat_result but were missing the new token_usage_callback kwarg, causing CI failure. Add **kwargs to all mock definitions. * fix: address PR review findings for token usage UI and data flow - Remove bare "bg" Tailwind class and replace hard-coded bg-neutral-700 with semantic bg-success-background on success tooltip (C2/C3) - Propagate usage properties regardless of source.id presence so agent inner messages still show token counts (H9) - Make PropertiesType.source optional to match the new data flow - Restore "in" preposition in "Finished in X.Xs" chat message (M10) - Fix misleading "optional dependency" comment in native_callback.py (C1) * fix: structured output token usage not captured due to config key mismatch get_chat_result() reads "get_langchain_callbacks" as a callable, but structured output was passing "callbacks" as a list — the token handler was silently dropped. Fix by matching the expected key names and injecting TokenUsageCallbackHandler via the LangChain callback chain instead of the token_usage_callback parameter (which doesn't fire for structured output chains that return Pydantic models, not AIMessages). * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * chore: update component index * test: add E2E tests for token usage tracking * [autofix.ci] apply automated fixes * test: add missing unit tests from PR review Adds the 4 recommended test scenarios identified in Cristhianzl's review of PR #11891 (token usage tracking): - TestStreamingTokenAccumulation: verifies extract_usage_from_chunk() + accumulate_usage() correctly accumulates across multiple streaming chunks (OpenAI, Anthropic, and usage_metadata formats) - TestChatOutputTokenUsageAccumulation: verifies message_response() sets upstream token usage on the message and updates the stored message when applicable - TestAgentTokenCallbackWiring: verifies TokenUsageCallbackHandler is wired into run_agent() callbacks and its result is stored on _token_usage - TestResultDataResponseTokenUsageValidator: verifies the field_validator converts Usage Pydantic models to dicts and passes through None/dict values * [autofix.ci] apply automated fixes * Revert "[autofix.ci] apply automated fixes" This reverts commit c618b12. * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes * fix: move hover action bar above message to prevent overlap with header row Position the EditMessageButton toolbar using \`bottom-full\` instead of \`-top-4\` so it always sits fully above the message container. This prevents the button bar from overlapping the 'Finished in' usage/time row in bot messages. * [autofix.ci] apply automated fixes * feat: add token usage tooltip to bot message and fix node status background - Wrap "Finished in" stat in a ShadTooltip showing last run time, duration, input/output token breakdown - Fix node status success background color from bg-success-background to bg-zinc-700 --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
diff --git a/.gitignore b/.gitignore
@@ -292,4 +292,4 @@ sso-config.yaml
 AGENTS.md
 CLAUDE.local.md
 langflow.log.*
-tmp_toolguard/
+tmp_toolguard/
diff --git a/.secrets.baseline b/.secrets.baseline
@@ -502,7 +502,7 @@
         "filename": "docs/docs/Develop/integrations-langfuse.mdx",
         "hashed_secret": "e80c4f90316c87b6b24d03890493c8d1c7c1c99d",
         "is_verified": false,
-        "line_number": 109,
+        "line_number": 100,
         "is_secret": false
       }
     ],
@@ -2926,6 +2926,15 @@
         "is_secret": false
       }
     ],
+    "src/backend/tests/unit/api/v1/test_deployment_route_handlers.py": [
+      {
+        "type": "Secret Keyword",
+        "filename": "src/backend/tests/unit/api/v1/test_deployment_route_handlers.py",
+        "hashed_secret": "505032eaf8a3acf9b094a326dfb1cd0537c75a0d",
+        "is_verified": false,
+        "line_number": 291
+      }
+    ],
     "src/backend/tests/unit/api/v1/test_deployment_schemas.py": [
       {
         "type": "Secret Keyword",
@@ -3341,6 +3350,15 @@
         "is_secret": false
       }
     ],
+    "src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py": [
+      {
+        "type": "Secret Keyword",
+        "filename": "src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py",
+        "hashed_secret": "3acfb2c2b433c0ea7ff107e33df91b18e52f960f",
+        "is_verified": false,
+        "line_number": 256
+      }
+    ],
     "src/backend/tests/unit/components/files_and_knowledge/test_retrieval.py": [
       {
         "type": "Secret Keyword",
@@ -3471,7 +3489,7 @@
         "filename": "src/backend/tests/unit/components/llm_operations/test_batch_run_component.py",
         "hashed_secret": "665b1e3851eefefa3fb878654292f16597d25155",
         "is_verified": false,
-        "line_number": 28,
+        "line_number": 40,
         "is_secret": false
       }
     ],
@@ -3525,23 +3543,23 @@
         "filename": "src/backend/tests/unit/components/models_and_agents/test_agent_component.py",
         "hashed_secret": "d4c3d66fd0c38547a3c7a4c6bdc29c36911bc030",
         "is_verified": false,
-        "line_number": 459,
+        "line_number": 200,
         "is_secret": false
       },
       {
         "type": "Secret Keyword",
         "filename": "src/backend/tests/unit/components/models_and_agents/test_agent_component.py",
         "hashed_secret": "665b1e3851eefefa3fb878654292f16597d25155",
         "is_verified": false,
-        "line_number": 494,
+        "line_number": 239,
         "is_secret": false
       },
       {
         "type": "Secret Keyword",
         "filename": "src/backend/tests/unit/components/models_and_agents/test_agent_component.py",
         "hashed_secret": "2e7a7ee14caebf378fc32d6cf6f557f347c96773",
         "is_verified": false,
-        "line_number": 528,
+        "line_number": 280,
         "is_secret": false
       }
     ],
@@ -3655,7 +3673,7 @@
         "filename": "src/backend/tests/unit/components/models_and_agents/test_language_model_component.py",
         "hashed_secret": "3acfb2c2b433c0ea7ff107e33df91b18e52f960f",
         "is_verified": false,
-        "line_number": 271,
+        "line_number": 256,
         "is_secret": false
       }
     ],
@@ -8238,23 +8256,23 @@
         "filename": "src/lfx/tests/unit/inputs/test_max_tokens_propagation.py",
         "hashed_secret": "665b1e3851eefefa3fb878654292f16597d25155",
         "is_verified": false,
-        "line_number": 107,
+        "line_number": 121,
         "is_secret": false
       },
       {
         "type": "Secret Keyword",
         "filename": "src/lfx/tests/unit/inputs/test_max_tokens_propagation.py",
         "hashed_secret": "e9b4dce312643ee0e1bd0561a50d9d5a7e5a2be1",
         "is_verified": false,
-        "line_number": 140,
+        "line_number": 154,
         "is_secret": false
       },
       {
         "type": "Secret Keyword",
         "filename": "src/lfx/tests/unit/inputs/test_max_tokens_propagation.py",
         "hashed_secret": "3f2df46921dd8e2c36e2ce85238705ac0774c74a",
         "is_verified": false,
-        "line_number": 220,
+        "line_number": 234,
         "is_secret": false
       }
     ],
diff --git a/src/frontend/src/components/core/playgroundComponent/chat-view/chat-messages/components/bot-message.tsx b/src/frontend/src/components/core/playgroundComponent/chat-view/chat-messages/components/bot-message.tsx
@@ -11,6 +11,7 @@ import useAlertStore from "@/stores/alertStore";
 import useFlowStore from "@/stores/flowStore";
 import useFlowsManagerStore from "@/stores/flowsManagerStore";
 import type { chatMessagePropsType } from "@/types/components";
+import { formatTokenCount } from "@/utils/format-token-count";
 import { cn } from "@/utils/utils";
 import { useMessageDuration } from "../hooks/use-message-duration";
 import { useStreamingMessage } from "../hooks/use-streaming-message";
@@ -126,6 +127,42 @@ export const BotMessage = memo(
         ? persistedDuration
         : liveDisplayTime;
 
+    const formattedTokenCount = formatTokenCount(
+      chat.properties?.usage?.total_tokens,
+    );
+
+    const tokenTooltipContent =
+      displayTime > 0 ? (
+        <div className="flex flex-col gap-1">
+          <div className="flex items-center text-xxs text-secondary-foreground">
+            <div>Last run:</div>
+            <div className="ml-1">{chat.timestamp}</div>
+          </div>
+          <div className="flex items-center text-xxs text-secondary-foreground">
+            <div>Duration:</div>
+            <div className="ml-auto">{formatSeconds(displayTime)}</div>
+          </div>
+          {chat.properties?.usage?.input_tokens != null && (
+            <div className="flex items-center text-xxs text-secondary-foreground">
+              <div>Input:</div>
+              <div className="ml-auto flex items-center gap-1 font-mono text-xs">
+                <ForwardedIconComponent name="Coins" className="h-3 w-3" />
+                {formatTokenCount(chat.properties.usage.input_tokens)}
+              </div>
+            </div>
+          )}
+          {chat.properties?.usage?.output_tokens != null && (
+            <div className="flex items-center text-xxs text-secondary-foreground">
+              <div>Output:</div>
+              <div className="ml-auto flex items-center gap-1 font-mono text-xs">
+                <ForwardedIconComponent name="Coins" className="h-3 w-3" />
+                {formatTokenCount(chat.properties.usage.output_tokens)}
+              </div>
+            </div>
+          )}
+        </div>
+      ) : null;
+
     return (
       <>
         <div className="w-full word-break-break-word mt-2">