refactor: introduce canonical tool naming convention and update evalsets & model generators to reflect this change. by omkargaikwad23 · Pull Request #400 · GoogleCloudPlatform/evalbench

omkargaikwad23 · 2026-05-21T11:54:11Z

Summary

The changes centralize tool name normalization into a new utility module, allowing the trajectory matcher to remain generator-agnostic while ensuring that tool identities are preserved across different agent platforms

Key Changes

Canonical Naming Convention: Standardized MCP tool names to the server__tool format (using double underscores) to distinguish between tools with the same name across different servers.
Normalization Utility: Added evalbench/generators/models/tool_naming.py to handle specific formatting differences from Claude Code (mcp__), Gemini CLI (mcp_), and Codex CLI.
Agnostic Scoring: Refactored TrajectoryMatcher to remove per-generator normalization logic, enabling simpler string-based comparisons of pre-normalized trajectories.
Dataset Migration: Updated expected_trajectory entries in evaluation sets (e.g., Cloud SQL and Bigtable datasets) to reflect the new canonical format.
Improved Testing: Added comprehensive unit tests for the naming helpers and updated trajectory matcher tests to verify the new standard.

…ets and model generators to reflect this change.

…on accordingly

…r expected trajectories

IsmailMehdi · 2026-05-21T18:57:37Z

/gcbrun

refactor: introduce canonical tool naming convention and update evals…

6cead86

…ets and model generators to reflect this change.

omkargaikwad23 requested a review from IsmailMehdi as a code owner May 21, 2026 11:54

omkargaikwad23 requested a review from prernakakkar-google May 21, 2026 11:54

github-code-quality Bot found potential problems May 21, 2026

View reviewed changes

Comment thread evalbench/generators/models/tool_naming.py Fixed

omkargaikwad23 added 2 commits May 21, 2026 12:08

refactor: standardize tool names to snake_case and update documentati…

f0e4508

…on accordingly

docs: document and standardize canonical <server>__<tool> naming fo…

ac5b0cf

…r expected trajectories

IsmailMehdi approved these changes May 21, 2026

View reviewed changes

IsmailMehdi merged commit ed34fcd into main May 21, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: introduce canonical tool naming convention and update evalsets & model generators to reflect this change.#400

refactor: introduce canonical tool naming convention and update evalsets & model generators to reflect this change.#400
IsmailMehdi merged 3 commits into
mainfrom
trajectory_matcher

omkargaikwad23 commented May 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

IsmailMehdi commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

omkargaikwad23 commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Uh oh!

Uh oh!

IsmailMehdi commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

omkargaikwad23 commented May 21, 2026 •

edited

Loading