Executive Summary
In the last 24 hours (2026-05-20 12:46 UTC → 2026-05-21 12:46 UTC), agentic workflows in github/gh-aw consumed approximately 213.4M total tokens across ~310 unique workflow runs spanning ~100 unique workflows. Per-PR review pipelines (Test Quality Sentinel, Matt Pocock Skills Reviewer, PR Code Quality Reviewer) dominate aggregate consumption due to high run frequency, while several single-run daily reports (Firewall Logs Collector, Community Attribution, Package Spec Librarian, Linter Miner) consume 3–6M tokens per single run and are the most expensive per-invocation. No errors or log entries were observed in companion errors / logs datasets — telemetry pipeline appears healthy.
Key Metrics
| Metric |
Value |
| Events analyzed (gen_ai spans w/ token data) |
~1,107 |
| Events with token data |
~1,107 (100%) |
| Total input tokens |
~211M |
| Total output tokens |
~2.4M |
| Total tokens |
~213.4M |
| Unique workflows |
~100 |
| Unique workflow runs |
~310 |
| Avg tokens / run |
~688K |
| P95 tokens / run (est.) |
~3.5M |
Note: each workflow run typically emits 4 sibling gen_ai spans reporting the same token totals; aggregate sums above were divided by the per-workflow count() / count_unique(gh-aw.run.id) ratio to avoid double-counting. Raw aggregate sum(gen_ai.usage.total_tokens) across all spans was ~854M (= 213.4M × ~4 duplication factor).
Top 10 Workflows by Token Consumption
| Workflow |
Runs |
Input Tokens |
Output Tokens |
Total Tokens |
Avg/Run |
| Test Quality Sentinel |
39 |
28,351,864 |
288,579 |
28,640,443 |
734,370 |
| Matt Pocock Skills Reviewer |
31 |
21,382,720 |
162,941 |
21,545,661 |
695,021 |
| PR Code Quality Reviewer |
40 |
21,343,753 |
158,485 |
21,502,238 |
537,556 |
| PR Sous Chef |
22 |
~18.2M |
~497K |
18,701,464 |
850,066 |
| Chaos PR Bundle Fuzzer |
7 |
10,115,455 |
65,192 |
10,180,647 |
1,454,378 |
| Contribution Check |
5 |
9,969,561 |
79,687 |
10,049,248 |
2,009,850 |
| Daily Firewall Logs Collector and Reporter |
1 |
5,835,404 |
39,262 |
5,874,666 |
5,874,666 |
| Daily Community Attribution Updater |
1 |
4,221,504 |
21,054 |
4,242,558 |
4,242,558 |
| Package Specification Librarian |
1 |
3,478,460 |
51,904 |
3,530,364 |
3,530,364 |
| Linter Miner |
1 |
3,118,624 |
31,975 |
3,150,599 |
3,150,599 |
Highest-token single run per top workflow
| Workflow |
Run ID |
Total Tokens |
| Daily Firewall Logs Collector and Reporter |
26203472250 |
5,874,666 |
| PR Sous Chef (highest single run) |
26222262708 |
4,319,402 |
| Daily Community Attribution Updater |
26203736817 |
4,242,558 |
| PR Sous Chef (second-highest) |
26218530960 |
3,926,768 |
| Package Specification Librarian |
26167560886 |
3,530,364 |
| Linter Miner |
26180896986 |
3,150,599 |
| Daily SPDD Spec Planner |
26176947757 |
3,070,226 |
| Daily Security Observability Report |
26176856527 |
2,772,393 |
| Contribution Check (highest single run) |
26166594512 |
2,443,973 |
| Chaos PR Bundle Fuzzer (highest single run) |
26218516360 |
2,414,476 |
| Daily MCP Tool Concurrency Analysis |
26220221074 |
2,212,930 |
| UK AI Operational Resilience |
26175963806 |
2,287,188 |
| Daily Agent of the Day Blog Writer |
26172025729 |
2,202,516 |
| Daily Regulatory Report Generator |
26191963827 |
2,099,186 |
| Test Quality Sentinel (highest single run) |
26203507094 |
2,106,996 |
Token output ratios — workflows with unusually high output share
Most workflows are input-heavy (output / input < 5%). Workflows where output dominates and may indicate generative/long-form behavior:
| Workflow |
Input |
Output |
Output Share |
| Daily Code Metrics and Trend Tracking Agent |
17,583 |
32,735 |
65.1% |
| Daily Reliability Review |
~7,047 |
~20,674 |
74.6% |
| Agentic Workflow Audit Agent |
10,068 |
34,819 |
77.6% |
| Sergo - Serena Go Expert |
19,646 |
35,409 |
64.3% |
| Daily Sub-Agent Optimizer |
18,122 |
36,804 |
67.0% |
| [aw] Failure Investigator (6h) |
~7,200 |
~20,000–32,000 |
~70–82% |
| Copilot Session Insights |
21,604 |
27,081 |
55.6% |
| GitHub API Consumption Report Agent |
14,700 |
27,649 |
65.3% |
| DeepReport - Intelligence Gathering Agent |
13,810 |
21,843 |
61.2% |
These small-input/large-output workflows likely use Claude/Anthropic models in generation-heavy mode (analysis reports, documentation rewrites).
Data Quality and Gaps
- Span duplication: each workflow run emits 2–5 sibling
gen_ai spans that report the same token totals. Aggregation divided raw sum() by per-workflow count()/count_unique(gh-aw.run.id) ratio (typically 4) to avoid inflation. This is the dominant source of estimation error.
- Pagination coverage: ~700+ raw spans were paginated explicitly; aggregate
sum()/count() query covered the full 24h window for the top-100 workflows. ~5–10 small-volume workflows fall below the 100-result truncation and are excluded from the grand total — their contribution is estimated <500K tokens.
- Companion datasets:
errors and logs datasets returned zero events in 24h for the gh-aw project. No telemetry-emit failures or runtime errors observed.
- Workflow name attribution: 100% of token-bearing spans had
gh-aw.workflow.name populated. Two appearances of [Filtered] (2 unique runs, 701,298 + 141,418 = 842,716 tokens) reflect Sentry's PII scrubbing of the workflow name attribute, not actual workflow naming. Worth confirming why those specific workflow names trigger PII filters in actions/setup/js/send_otlp_span.cjs.
- Missing model attribution:
gen_ai.request.model was populated on ~25% of spans (typically only the leaf span carries it). Most reported claude-sonnet-4.5, gpt-5-mini, claude-haiku-4.5, or auto. Spotted anomaly: one span tagged gpt-5.4 and one gpt-5.4-mini — possibly typos or experimental model strings worth verifying.
gh-aw.workflow.name vs github.workflow: The OTLP emit code populates the gh-aw-prefixed attribute, but the standard github.workflow attribute was empty across all spans inspected. Worth aligning these for consumers expecting the conventional OTel/GitHub semantic.
Recommendations
- Investigate PR Sous Chef cost variance — observed runs range from 178K to 4.32M tokens (24x spread). Run
26222262708 (4.32M) and 26218530960 (3.93M) used gpt-5-mini but consumed Claude-Sonnet-tier token volumes. Add a per-run input-context audit and consider prompt-caching or truncation if the high runs replay large PR diffs.
- Cap context for single-run "daily" workflows >3M tokens —
Daily Firewall Logs Collector and Reporter (5.87M), Daily Community Attribution Updater (4.24M), Package Specification Librarian (3.53M), Linter Miner (3.15M), and Daily SPDD Spec Planner (3.07M) each burn 3–6M tokens in one shot. Audit their MCP queries / Sentry windows for over-fetching and consider pagination or summarization passes before sending to the model.
- De-duplicate sibling
gen_ai spans at emit time — the 4x span duplication inflates all OTel sums and confuses cost dashboards. Either emit token usage only on the leaf span, or add a gen_ai.span.role attribute (parent/leaf) so consumers can filter cleanly. Reference: actions/setup/js/send_otlp_span.cjs.
- Right-size high-frequency PR review fleet —
Test Quality Sentinel, Matt Pocock Skills Reviewer, and PR Code Quality Reviewer ran 39–40 times in 24h consuming ~71.7M tokens combined. Consider merging overlapping reviewer agents, increasing trigger thresholds (skip drafts/WIP), or sharing a cached repo-context blob across the three review pipelines.
References
Generated by 📊 Daily Token Consumption Report (Sentry OTel) · ● 36.2M · ◷
Executive Summary
In the last 24 hours (2026-05-20 12:46 UTC → 2026-05-21 12:46 UTC), agentic workflows in
github/gh-awconsumed approximately 213.4M total tokens across ~310 unique workflow runs spanning ~100 unique workflows. Per-PR review pipelines (Test Quality Sentinel,Matt Pocock Skills Reviewer,PR Code Quality Reviewer) dominate aggregate consumption due to high run frequency, while several single-run daily reports (Firewall Logs Collector, Community Attribution, Package Spec Librarian, Linter Miner) consume 3–6M tokens per single run and are the most expensive per-invocation. No errors or log entries were observed in companionerrors/logsdatasets — telemetry pipeline appears healthy.Key Metrics
Note: each workflow run typically emits 4 sibling gen_ai spans reporting the same token totals; aggregate sums above were divided by the per-workflow
count() / count_unique(gh-aw.run.id)ratio to avoid double-counting. Raw aggregatesum(gen_ai.usage.total_tokens)across all spans was ~854M (= 213.4M × ~4 duplication factor).Top 10 Workflows by Token Consumption
Highest-token single run per top workflow
Token output ratios — workflows with unusually high output share
Most workflows are input-heavy (output / input < 5%). Workflows where output dominates and may indicate generative/long-form behavior:
These small-input/large-output workflows likely use Claude/Anthropic models in generation-heavy mode (analysis reports, documentation rewrites).
Data Quality and Gaps
gen_aispans that report the same token totals. Aggregation divided rawsum()by per-workflowcount()/count_unique(gh-aw.run.id)ratio (typically 4) to avoid inflation. This is the dominant source of estimation error.sum()/count()query covered the full 24h window for the top-100 workflows. ~5–10 small-volume workflows fall below the 100-result truncation and are excluded from the grand total — their contribution is estimated <500K tokens.errorsandlogsdatasets returned zero events in 24h for thegh-awproject. No telemetry-emit failures or runtime errors observed.gh-aw.workflow.namepopulated. Two appearances of[Filtered](2 unique runs, 701,298 + 141,418 = 842,716 tokens) reflect Sentry's PII scrubbing of the workflow name attribute, not actual workflow naming. Worth confirming why those specific workflow names trigger PII filters inactions/setup/js/send_otlp_span.cjs.gen_ai.request.modelwas populated on ~25% of spans (typically only the leaf span carries it). Most reportedclaude-sonnet-4.5,gpt-5-mini,claude-haiku-4.5, orauto. Spotted anomaly: one span taggedgpt-5.4and onegpt-5.4-mini— possibly typos or experimental model strings worth verifying.gh-aw.workflow.namevsgithub.workflow: The OTLP emit code populates the gh-aw-prefixed attribute, but the standardgithub.workflowattribute was empty across all spans inspected. Worth aligning these for consumers expecting the conventional OTel/GitHub semantic.Recommendations
26222262708(4.32M) and26218530960(3.93M) usedgpt-5-minibut consumed Claude-Sonnet-tier token volumes. Add a per-run input-context audit and consider prompt-caching or truncation if the high runs replay large PR diffs.Daily Firewall Logs Collector and Reporter(5.87M),Daily Community Attribution Updater(4.24M),Package Specification Librarian(3.53M),Linter Miner(3.15M), andDaily SPDD Spec Planner(3.07M) each burn 3–6M tokens in one shot. Audit their MCP queries / Sentry windows for over-fetching and consider pagination or summarization passes before sending to the model.gen_aispans at emit time — the 4x span duplication inflates all OTel sums and confuses cost dashboards. Either emit token usage only on the leaf span, or add agen_ai.span.roleattribute (parent/leaf) so consumers can filter cleanly. Reference:actions/setup/js/send_otlp_span.cjs.Test Quality Sentinel,Matt Pocock Skills Reviewer, andPR Code Quality Reviewerran 39–40 times in 24h consuming ~71.7M tokens combined. Consider merging overlapping reviewer agents, increasing trigger thresholds (skip drafts/WIP), or sharing a cached repo-context blob across the three review pipelines.References