Strip heavy fields from LLM-facing result data (#4886)

nikosbosse · claude · github-actions[bot] · commit 13a21c7c351d · 2026-03-18T14:12:11.000Z
Goal is basically to make sure that the agent can actually see all relevant data when calling report_result. To not pollute its context window, we clamp the return content, but previously the return content also included the source bank, for example. CLAUDE: ## Summary - Strip `_source_bank`, `research`, and `provenance_and_notes` fields from LLM-facing result data while keeping them in `structuredContent` (widget) - Update `clamp_page_to_budget` to estimate tokens on stripped records, matching what the LLM actually receives These fields contain full citation databases and research notes (25-30k chars per row). For a 7-row agent task, this caused the token budget (10k) to clamp results from 7 rows down to 1-4, producing messages like "I can only see 1 of the 7 rows". After stripping: ~16k tokens → ~1k tokens (94% reduction). Users still see all fields in the viz pane. ## Test plan - [ ] Run an agent task with 7+ rows and verify all rows appear in the LLM response (no "I can only see X of Y" message for small datasets) - [ ] Verify research notes and citations are still visible in the viz pane widget - [ ] Verify CSV download still contains all fields 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Sourced from commit 63d0ba0dabf0374e982f89b41daddd5465db11cf
diff --git a/everyrow-mcp/src/everyrow_mcp/result_store.py b/everyrow-mcp/src/everyrow_mcp/result_store.py
@@ -57,18 +57,31 @@ def _estimate_tokens(text: str) -> int:
     return litellm.token_counter(model=_TOKEN_MODEL, text=text)
 
 
+# Fields stripped from LLM-facing data (user sees them in the viz pane).
+_LLM_STRIP_FIELDS = {"_source_bank", "research", "provenance_and_notes"}
+
+
+def _strip_for_llm(records: list[dict[str, Any]]) -> list[dict[str, Any]]:
+    """Remove heavy fields that the user can see in the viz pane."""
+    return [
+        {k: v for k, v in row.items() if k not in _LLM_STRIP_FIELDS} for row in records
+    ]
+
+
 def clamp_page_to_budget(
     preview_records: list[dict[str, Any]],
     page_size: int,
 ) -> tuple[list[dict[str, Any]], int]:
-    estimated = _estimate_tokens(json.dumps(preview_records))
+    # Estimate tokens on stripped records (what the LLM actually sees).
+    stripped = _strip_for_llm(preview_records)
+    estimated = _estimate_tokens(json.dumps(stripped))
     if estimated <= settings.token_budget:
         return preview_records, page_size
 
     # Pre-compute per-row token sizes and build a prefix sum so the binary
     # search doesn't need to re-serialize on every iteration.
     # Overhead per-row is ~2 tokens for the JSON array wrapper/commas.
-    row_sizes = [_estimate_tokens(json.dumps(r)) + 2 for r in preview_records]
+    row_sizes = [_estimate_tokens(json.dumps(r)) + 2 for r in stripped]
     prefix = [0] * (len(row_sizes) + 1)
     for i, s in enumerate(row_sizes):
         prefix[i + 1] = prefix[i] + s
@@ -188,8 +201,9 @@ def _build_result_response(
     if artifact_id:
         summary += f"\nOutput artifact_id (use to chain into next tool): {artifact_id}"
 
-    # Append the actual data rows so the LLM can reason about them.
-    data_text = json.dumps(preview_records)
+    # Append data rows for the LLM, stripping heavy fields that the user
+    # can already see in the viz pane (source_bank, research notes).
+    data_text = json.dumps(_strip_for_llm(preview_records))
     summary += f"\n\nData:\n{data_text}"
 
     return CallToolResult(
diff --git a/everyrow-mcp/tests/test_result_store.py b/everyrow-mcp/tests/test_result_store.py
@@ -859,7 +859,7 @@ def wide_df(self) -> pd.DataFrame:
         return pd.DataFrame(
             {
                 "id": range(10),
-                "research": [f"Long research text {'x' * 2000}" for _ in range(10)],
+                "details": [f"Long details text {'x' * 2000}" for _ in range(10)],
                 "summary": [f"Summary {'y' * 500}" for _ in range(10)],
             }
         )

Original file line number	Diff line number	Diff line change
`@@ -859,7 +859,7 @@ def wide_df(self) -> pd.DataFrame:`
`859`	`859`	`return pd.DataFrame(`
`860`	`860`	`{`
`861`	`861`	`"id": range(10),`
`862`		`- "research": [f"Long research text {'x' * 2000}" for _ in range(10)],`
	`862`	`+ "details": [f"Long details text {'x' * 2000}" for _ in range(10)],`
`863`	`863`	`"summary": [f"Summary {'y' * 500}" for _ in range(10)],`
`864`	`864`	`}`
`865`	`865`	`)`