Add global agent tests and prompt tweak by hanna-paasivirta · Pull Request #450 · OpenFn/apollo

hanna-paasivirta · 2026-04-09T15:31:42Z

Short Description

Adds global_chat tests for scenarios besides one-shot generation and a routing prompt improvement.

The new tests address a few of the scenarios in #437 but not all of them.

Implementation Details

The prompt tweak in this PR helps direct multi-step tasks to the planner, even if they will only involve one type of agent. This helps the service gather information from across the full workflow, instead of isolated steps, or the YAML structure without the code.

Tests

The tests cover a routing matrix across three dimensions:

Intent: informational questions vs. modification requests
Context: workflow view vs. job code view
Scope: single-agent tasks vs. multi-agent tasks

A lot of the tests are adapted from Brandon's list of user scenarios, with easily verifiable information added in. They will need expanding and tweaking later.

Test details

Here's my prompt for generating the tests.

Workflow Summary
Given:

Cat Poetry Competition YAML Loaded with job code; one step fetches a cat fact; then one step generates a couplet with Claude in Swedish, and another, in parallel, generates a couplet with ChatGPT in Estonian. Crucially: give the workflow and its steps generic names like “generate text with ChatGPT”, “summarise”, and any information about it being a cat poetry competition should be hidden inside the step code in the body keys, so that we can verify that the model can and will inspect the full YAML and fetch the body keys.
User on workflow view
Prompt: “What does this do?”
Evaluate:
That we route to the correct agent: planner
That the response does not just describe a vague text generation workflow, but a cat poetry competition; assert for cat, Estonian, Swedish

Step Summary from Workflow View
Given:

Cat Poetry Competition YAML Loaded; same YAML as 1)
User on workflow view
Prompt: “What does the Claude step do?”
Evaluate:
That we route either to the job code assistant or planner+job code assistant
Describes the Claude prompt for a Swedish poem at minimum
Describes the role of the step in a workflow, i.e. mentions cat poetry competition (although see Question 1)
[Could be a bit vague though, it might be reasonable to just look at the overall workflow – a human might do that too]

Step Summary from Step Editor
Same as above, just change page url
Given:

Cat Poetry Competition YAML Loaded
User on job code view, in Claude step
Prompt: “What does this step do?”
Evaluate:
That we route to the job code assistant or planner+job code assistant
Describes the Claude prompt for a Swedish poem at minimum
Describes the role of the step in a workflow, i.e. mentions cat poetry competition [although see Question 1]

Edit Single Step from Workflow
Given:

Cat Poetry Competition YAML Loaded
User on workflow view
Prompt: “Modify ChatGPT step to ask for a haiku instead of a couplet.”
Evaluate:
Route to job code agent or planner+job code agent
Body key of the chatgpt step is changed and contains “haiku”
Rest of YAML unchanged
Job code agent input contains the correct job code

Edit Single Step from Step Editor
Same as above, just change page url
Given:

Cat Poetry Competition YAML Loaded
User on job code view for the ChatGPT step
Prompt: “Modify to ask for a haiku instead of a poem.”
Evaluate:
Route to job code agent or planner+job code agent
Body key of the chatgpt step is changed and contains “haiku”
Rest of YAML unchanged
Job code agent input contains the correct job code

Edit Multiple Steps from Workflow view
Given:

Cat Poetry Competition YAML Loaded
User on workflow view
Prompt: “I want all poems to be in French”
Evaluate:
Route to planner+job code agent 2x
“French” in both generation steps, and no “Swedish” or “Estonian”
Rest of YAML unchanged

Edit Multiple Steps from Step Editor
Same as above, just change page url
Given:

Cat Poetry Competition YAML Loaded
User on job code view for the ChatGPT step
Prompt: “I want all poems in the cat poetry competition to be in French”
Evaluate:
Route to planner+job code agent 2x
“French” in both generation steps, and no “Swedish” or “Estonian”
Rest of YAML unchanged

Add New Steps + edit workflow structure
Given:

Cat Poem YAML Loaded
User on workflow view
Prompt: “Make this a poetry competition. Send it to both Claude and ChatGPT. Send that to another Claude step to be the judge. Then send me the results.”
Evaluate:
Output similar to Cat Poetry Competition YAML

AI Usage

Please disclose how you've used AI in this work (it's cool, we just want to know!):

You can read more details in our Responsible AI Policy

hanna-paasivirta · 2026-04-09T15:34:23Z

@josephjclark this should require minimal review. The prompt change would just affect the global_chat service slightly and the tests are still a work in progress.

hanna-paasivirta added 3 commits April 7, 2026 23:27

add global agent tests

5c36171

redirect multi step tasks

9cb99e0

adjust test

5588d2e

hanna-paasivirta requested a review from josephjclark April 9, 2026 15:34

hanna-paasivirta added this to Core Apr 9, 2026

github-project-automation bot moved this to New Issues in Core Apr 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add global agent tests and prompt tweak#450

Add global agent tests and prompt tweak#450
hanna-paasivirta wants to merge 3 commits intomainfrom
global-agent-job-code

hanna-paasivirta commented Apr 9, 2026 •

edited

Loading

Uh oh!

hanna-paasivirta commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hanna-paasivirta commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Short Description

Implementation Details

Tests

Test details

AI Usage

Uh oh!

hanna-paasivirta commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hanna-paasivirta commented Apr 9, 2026 •

edited

Loading