Skip to content

[token-consumption] Daily Token Consumption Report - 2026-05-21 #33751

@github-actions

Description

@github-actions

Executive Summary

In the last 24 hours (2026-05-20 12:46 UTC → 2026-05-21 12:46 UTC), agentic workflows in github/gh-aw consumed approximately 213.4M total tokens across ~310 unique workflow runs spanning ~100 unique workflows. Per-PR review pipelines (Test Quality Sentinel, Matt Pocock Skills Reviewer, PR Code Quality Reviewer) dominate aggregate consumption due to high run frequency, while several single-run daily reports (Firewall Logs Collector, Community Attribution, Package Spec Librarian, Linter Miner) consume 3–6M tokens per single run and are the most expensive per-invocation. No errors or log entries were observed in companion errors / logs datasets — telemetry pipeline appears healthy.

Key Metrics

Metric Value
Events analyzed (gen_ai spans w/ token data) ~1,107
Events with token data ~1,107 (100%)
Total input tokens ~211M
Total output tokens ~2.4M
Total tokens ~213.4M
Unique workflows ~100
Unique workflow runs ~310
Avg tokens / run ~688K
P95 tokens / run (est.) ~3.5M

Note: each workflow run typically emits 4 sibling gen_ai spans reporting the same token totals; aggregate sums above were divided by the per-workflow count() / count_unique(gh-aw.run.id) ratio to avoid double-counting. Raw aggregate sum(gen_ai.usage.total_tokens) across all spans was ~854M (= 213.4M × ~4 duplication factor).

Top 10 Workflows by Token Consumption

Workflow Runs Input Tokens Output Tokens Total Tokens Avg/Run
Test Quality Sentinel 39 28,351,864 288,579 28,640,443 734,370
Matt Pocock Skills Reviewer 31 21,382,720 162,941 21,545,661 695,021
PR Code Quality Reviewer 40 21,343,753 158,485 21,502,238 537,556
PR Sous Chef 22 ~18.2M ~497K 18,701,464 850,066
Chaos PR Bundle Fuzzer 7 10,115,455 65,192 10,180,647 1,454,378
Contribution Check 5 9,969,561 79,687 10,049,248 2,009,850
Daily Firewall Logs Collector and Reporter 1 5,835,404 39,262 5,874,666 5,874,666
Daily Community Attribution Updater 1 4,221,504 21,054 4,242,558 4,242,558
Package Specification Librarian 1 3,478,460 51,904 3,530,364 3,530,364
Linter Miner 1 3,118,624 31,975 3,150,599 3,150,599
Highest-token single run per top workflow
Workflow Run ID Total Tokens
Daily Firewall Logs Collector and Reporter 26203472250 5,874,666
PR Sous Chef (highest single run) 26222262708 4,319,402
Daily Community Attribution Updater 26203736817 4,242,558
PR Sous Chef (second-highest) 26218530960 3,926,768
Package Specification Librarian 26167560886 3,530,364
Linter Miner 26180896986 3,150,599
Daily SPDD Spec Planner 26176947757 3,070,226
Daily Security Observability Report 26176856527 2,772,393
Contribution Check (highest single run) 26166594512 2,443,973
Chaos PR Bundle Fuzzer (highest single run) 26218516360 2,414,476
Daily MCP Tool Concurrency Analysis 26220221074 2,212,930
UK AI Operational Resilience 26175963806 2,287,188
Daily Agent of the Day Blog Writer 26172025729 2,202,516
Daily Regulatory Report Generator 26191963827 2,099,186
Test Quality Sentinel (highest single run) 26203507094 2,106,996
Token output ratios — workflows with unusually high output share

Most workflows are input-heavy (output / input < 5%). Workflows where output dominates and may indicate generative/long-form behavior:

Workflow Input Output Output Share
Daily Code Metrics and Trend Tracking Agent 17,583 32,735 65.1%
Daily Reliability Review ~7,047 ~20,674 74.6%
Agentic Workflow Audit Agent 10,068 34,819 77.6%
Sergo - Serena Go Expert 19,646 35,409 64.3%
Daily Sub-Agent Optimizer 18,122 36,804 67.0%
[aw] Failure Investigator (6h) ~7,200 ~20,000–32,000 ~70–82%
Copilot Session Insights 21,604 27,081 55.6%
GitHub API Consumption Report Agent 14,700 27,649 65.3%
DeepReport - Intelligence Gathering Agent 13,810 21,843 61.2%

These small-input/large-output workflows likely use Claude/Anthropic models in generation-heavy mode (analysis reports, documentation rewrites).

Data Quality and Gaps
  • Span duplication: each workflow run emits 2–5 sibling gen_ai spans that report the same token totals. Aggregation divided raw sum() by per-workflow count()/count_unique(gh-aw.run.id) ratio (typically 4) to avoid inflation. This is the dominant source of estimation error.
  • Pagination coverage: ~700+ raw spans were paginated explicitly; aggregate sum()/count() query covered the full 24h window for the top-100 workflows. ~5–10 small-volume workflows fall below the 100-result truncation and are excluded from the grand total — their contribution is estimated <500K tokens.
  • Companion datasets: errors and logs datasets returned zero events in 24h for the gh-aw project. No telemetry-emit failures or runtime errors observed.
  • Workflow name attribution: 100% of token-bearing spans had gh-aw.workflow.name populated. Two appearances of [Filtered] (2 unique runs, 701,298 + 141,418 = 842,716 tokens) reflect Sentry's PII scrubbing of the workflow name attribute, not actual workflow naming. Worth confirming why those specific workflow names trigger PII filters in actions/setup/js/send_otlp_span.cjs.
  • Missing model attribution: gen_ai.request.model was populated on ~25% of spans (typically only the leaf span carries it). Most reported claude-sonnet-4.5, gpt-5-mini, claude-haiku-4.5, or auto. Spotted anomaly: one span tagged gpt-5.4 and one gpt-5.4-mini — possibly typos or experimental model strings worth verifying.
  • gh-aw.workflow.name vs github.workflow: The OTLP emit code populates the gh-aw-prefixed attribute, but the standard github.workflow attribute was empty across all spans inspected. Worth aligning these for consumers expecting the conventional OTel/GitHub semantic.

Recommendations

  1. Investigate PR Sous Chef cost variance — observed runs range from 178K to 4.32M tokens (24x spread). Run 26222262708 (4.32M) and 26218530960 (3.93M) used gpt-5-mini but consumed Claude-Sonnet-tier token volumes. Add a per-run input-context audit and consider prompt-caching or truncation if the high runs replay large PR diffs.
  2. Cap context for single-run "daily" workflows >3M tokensDaily Firewall Logs Collector and Reporter (5.87M), Daily Community Attribution Updater (4.24M), Package Specification Librarian (3.53M), Linter Miner (3.15M), and Daily SPDD Spec Planner (3.07M) each burn 3–6M tokens in one shot. Audit their MCP queries / Sentry windows for over-fetching and consider pagination or summarization passes before sending to the model.
  3. De-duplicate sibling gen_ai spans at emit time — the 4x span duplication inflates all OTel sums and confuses cost dashboards. Either emit token usage only on the leaf span, or add a gen_ai.span.role attribute (parent/leaf) so consumers can filter cleanly. Reference: actions/setup/js/send_otlp_span.cjs.
  4. Right-size high-frequency PR review fleetTest Quality Sentinel, Matt Pocock Skills Reviewer, and PR Code Quality Reviewer ran 39–40 times in 24h consuming ~71.7M tokens combined. Consider merging overlapping reviewer agents, increasing trigger thresholds (skip drafts/WIP), or sharing a cached repo-context blob across the three review pipelines.

References

Generated by 📊 Daily Token Consumption Report (Sentry OTel) · ● 36.2M ·

  • expires on May 22, 2026, 12:58 PM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions