Skip to content

Add global agent tests and prompt tweak#450

Open
hanna-paasivirta wants to merge 3 commits intomainfrom
global-agent-job-code
Open

Add global agent tests and prompt tweak#450
hanna-paasivirta wants to merge 3 commits intomainfrom
global-agent-job-code

Conversation

@hanna-paasivirta
Copy link
Copy Markdown
Contributor

@hanna-paasivirta hanna-paasivirta commented Apr 9, 2026

Short Description

Adds global_chat tests for scenarios besides one-shot generation and a routing prompt improvement.

The new tests address a few of the scenarios in #437 but not all of them.

Implementation Details

The prompt tweak in this PR helps direct multi-step tasks to the planner, even if they will only involve one type of agent. This helps the service gather information from across the full workflow, instead of isolated steps, or the YAML structure without the code.

Tests

The tests cover a routing matrix across three dimensions:

  • Intent: informational questions vs. modification requests
  • Context: workflow view vs. job code view
  • Scope: single-agent tasks vs. multi-agent tasks

A lot of the tests are adapted from Brandon's list of user scenarios, with easily verifiable information added in. They will need expanding and tweaking later.

Test details

Here's my prompt for generating the tests.

  1. Workflow Summary
    Given:
  • Cat Poetry Competition YAML Loaded with job code; one step fetches a cat fact; then one step generates a couplet with Claude in Swedish, and another, in parallel, generates a couplet with ChatGPT in Estonian. Crucially: give the workflow and its steps generic names like “generate text with ChatGPT”, “summarise”, and any information about it being a cat poetry competition should be hidden inside the step code in the body keys, so that we can verify that the model can and will inspect the full YAML and fetch the body keys.
  • User on workflow view
    Prompt: “What does this do?”
    Evaluate:
  • That we route to the correct agent: planner
  • That the response does not just describe a vague text generation workflow, but a cat poetry competition; assert for cat, Estonian, Swedish
  1. Step Summary from Workflow View
    Given:
  • Cat Poetry Competition YAML Loaded; same YAML as 1)
  • User on workflow view
    Prompt: “What does the Claude step do?”
    Evaluate:
  • That we route either to the job code assistant or planner+job code assistant
  • Describes the Claude prompt for a Swedish poem at minimum
  • Describes the role of the step in a workflow, i.e. mentions cat poetry competition (although see Question 1)
  • [Could be a bit vague though, it might be reasonable to just look at the overall workflow – a human might do that too]
  1. Step Summary from Step Editor
    Same as above, just change page url
    Given:
  • Cat Poetry Competition YAML Loaded
  • User on job code view, in Claude step
    Prompt: “What does this step do?”
    Evaluate:
  • That we route to the job code assistant or planner+job code assistant
  • Describes the Claude prompt for a Swedish poem at minimum
  • Describes the role of the step in a workflow, i.e. mentions cat poetry competition [although see Question 1]
  1. Edit Single Step from Workflow
    Given:
  • Cat Poetry Competition YAML Loaded
  • User on workflow view
    Prompt: “Modify ChatGPT step to ask for a haiku instead of a couplet.”
    Evaluate:
  • Route to job code agent or planner+job code agent
  • Body key of the chatgpt step is changed and contains “haiku”
  • Rest of YAML unchanged
  • Job code agent input contains the correct job code
  1. Edit Single Step from Step Editor
    Same as above, just change page url
    Given:
  • Cat Poetry Competition YAML Loaded
  • User on job code view for the ChatGPT step
    Prompt: “Modify to ask for a haiku instead of a poem.”
    Evaluate:
  • Route to job code agent or planner+job code agent
  • Body key of the chatgpt step is changed and contains “haiku”
  • Rest of YAML unchanged
  • Job code agent input contains the correct job code
  1. Edit Multiple Steps from Workflow view
    Given:
  • Cat Poetry Competition YAML Loaded
  • User on workflow view
    Prompt: “I want all poems to be in French”
    Evaluate:
  • Route to planner+job code agent 2x
  • “French” in both generation steps, and no “Swedish” or “Estonian”
  • Rest of YAML unchanged
  1. Edit Multiple Steps from Step Editor
    Same as above, just change page url
    Given:
  • Cat Poetry Competition YAML Loaded
  • User on job code view for the ChatGPT step
    Prompt: “I want all poems in the cat poetry competition to be in French”
    Evaluate:
  • Route to planner+job code agent 2x
  • “French” in both generation steps, and no “Swedish” or “Estonian”
  • Rest of YAML unchanged
  1. Add New Steps + edit workflow structure
    Given:
  • Cat Poem YAML Loaded
  • User on workflow view
    Prompt: “Make this a poetry competition. Send it to both Claude and ChatGPT. Send that to another Claude step to be the judge. Then send me the results.”
    Evaluate:
  • Output similar to Cat Poetry Competition YAML

AI Usage

Please disclose how you've used AI in this work (it's cool, we just want to know!):

  • Code generation (copilot but not intellisense)
  • Learning or fact checking
  • Strategy / design
  • Optimisation / refactoring
  • Translation / spellchecking / doc gen
  • Other
  • I have not used AI

You can read more details in our Responsible AI Policy

@hanna-paasivirta
Copy link
Copy Markdown
Contributor Author

@josephjclark this should require minimal review. The prompt change would just affect the global_chat service slightly and the tests are still a work in progress.

@github-project-automation github-project-automation bot moved this to New Issues in Core Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: New Issues

Development

Successfully merging this pull request may close these issues.

1 participant