Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions src/ai/tester.ts
Original file line number Diff line number Diff line change
Expand Up @@ -546,8 +546,8 @@ export class Tester extends TaskAgent implements Agent {
const schema = z.object({
assessment: z.string().describe('Short review of current progress toward the main scenario goal'),
suggestion: z.string().describe('Specific next action recommendation'),
recommendReset: z.boolean().optional().describe('Recommend reset() if persistent failures suggest navigation issues'),
recommendStop: z.boolean().optional().describe('Recommend stop() if test is fundamentally incompatible or cannot proceed'),
recommendReset: z.boolean().describe('Recommend reset() if persistent failures suggest navigation issues'),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The recommendReset field is required in the Zod schema, but the model prompt does not explicitly guarantee it will always be returned, so generateObject can fail validation if the model omits it, causing response to be undefined and silently skipping the progress report logic. [possible bug]

Severity Level: Major ⚠️
- ❌ Progress checkpoint report skipped during test run.
- ⚠️ analyzeProgress() returns empty string instead of report.
- ⚠️ Test guidance (reset suggestion) may be lost.
Suggested change
recommendReset: z.boolean().describe('Recommend reset() if persistent failures suggest navigation issues'),
recommendReset: z.boolean().optional().describe('Recommend reset() if persistent failures suggest navigation issues'),
Steps of Reproduction ✅
1. Run a test that reaches the progress analysis checkpoint so Tester.analyzeProgress() is
executed. The call site is in src/ai/tester.ts inside test() where the loop calls
this.analyzeProgress(task, currentState) (see the loop block that conditionally calls
analyzeProgress).

2. Inside src/ai/tester.ts, open the analyzeProgress function where the Zod schema is
defined (schema declaration starts at line 546 and includes the recommendReset field at
line 549).

3. The function builds a prompt and then calls the provider to parse the model output:
const model = this.provider.getModelForAgent('tester'); const response = await
this.provider.generateObject(..., schema, model); (the generateObject invocation is the
provider.parse call that immediately follows the schema definition in the same function).

4. If the LLM response omits the boolean field recommendReset, Zod validation will fail
because recommendReset is required (line 549). generateObject will therefore not return a
valid response.object, and analyzeProgress will receive response as undefined or a failed
parse, causing: const result = response?.object; if (!result) return ''; — the progress
checkpoint report is skipped. Reproducing this is achieved by running a test scenario that
triggers analyzeProgress and forcing the model to reply without the recommendReset boolean
(e.g., with a concise textual assessment that lacks that field).
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** src/ai/tester.ts
**Line:** 549:549
**Comment:**
	*Possible Bug: The `recommendReset` field is required in the Zod schema, but the model prompt does not explicitly guarantee it will always be returned, so `generateObject` can fail validation if the model omits it, causing `response` to be undefined and silently skipping the progress report logic.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be confusing as it requires LLM to fill all fields and it would provide recommendation which it was previously about to avoid

Probably best way to remove all that recommended* fields alltogether

And keep only suggestion

Also probably we can use enum if this allowed:

recommendedTool: 'reset|stop|continue'

recommendStop: z.boolean().describe('Recommend stop() if test is fundamentally incompatible or cannot proceed'),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The recommendStop field is also required in the Zod schema without the prompt explicitly enforcing its presence, so if the model does not include it, generateObject may fail validation and result handling will be skipped. [possible bug]

Severity Level: Major ⚠️
- ❌ Stop recommendation lost when model omits the flag.
- ⚠️ analyzeProgress() may not record persistent failure actions.
- ⚠️ Tester misses critical stop/reset guidance.
Suggested change
recommendStop: z.boolean().describe('Recommend stop() if test is fundamentally incompatible or cannot proceed'),
recommendStop: z.boolean().optional().describe('Recommend stop() if test is fundamentally incompatible or cannot proceed'),
Steps of Reproduction ✅
1. Trigger the same code path as previous step: execute a test that leads to
Tester.analyzeProgress() being called from the main test loop (the conditional call to
analyzeProgress is in src/ai/tester.ts within the main loop).

2. Inspect the schema in analyzeProgress where recommendStop is declared (schema block at
lines 546-551, with recommendStop at line 550).

3. The function calls the provider to parse the LLM output via
this.provider.generateObject(..., schema, model). If the LLM response includes assessment
and suggestion but omits recommendStop, Zod will reject the object because recommendStop
is required (line 550).

4. When validation fails, generateObject yields no usable response.object, analyzeProgress
treats that as no result (const result = response?.object; if (!result) return ''), and
the function exits without adding the expected progress note or advising stop/reset.
Reproduce by running any test that causes analyzeProgress to run and configuring the LLM
(or using a prompt/seed) so it omits the recommendStop boolean.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** src/ai/tester.ts
**Line:** 550:550
**Comment:**
	*Possible Bug: The `recommendStop` field is also required in the Zod schema without the prompt explicitly enforcing its presence, so if the model does not include it, `generateObject` may fail validation and result handling will be skipped.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

});

let problemContext = '';
Expand Down Expand Up @@ -664,7 +664,7 @@ export class Tester extends TaskAgent implements Agent {
const schema = z.object({
summary: z.string().describe('Concise overview of the test findings'),
scenarioAchieved: z.boolean().describe('Indicates if the scenario goal appears satisfied'),
recommendation: z.string().optional().describe('Follow-up suggestion if needed'),
recommendation: z.string().describe('Follow-up suggestion if needed'),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: In finalReview, the new recommendation field is required in the Zod schema but the prompt never asks the model to provide it, so generateObject can fail schema validation if the field is omitted, preventing summaries from being processed. [possible bug]

Severity Level: Major ⚠️
- ❌ Final summary not recorded in task.summary.
- ❌ finalReview may not call task.finish().
- ⚠️ Test session ends without final notes.
Suggested change
recommendation: z.string().describe('Follow-up suggestion if needed'),
recommendation: z.string().optional().describe('Follow-up suggestion if needed'),
Steps of Reproduction ✅
1. Execute a test that reaches the end-of-test cleanup so Tester.finalReview(task) is
invoked. finalReview is defined in src/ai/tester.ts (the function signature is private
async finalReview(task: Test): Promise<void; see the finalReview function block).

2. In finalReview, inspect the Zod schema declared for the final summary (schema block at
lines 664-668, where recommendation is currently required at line 667).

3. finalReview then calls the provider to parse the LLM output: const model =
this.provider.getModelForAgent('tester'); const response = await
this.provider.generateObject(..., schema, model); (the generateObject invocation follows
the schema definition in the same function).

4. If the LLM provides summary and scenarioAchieved but does not emit a recommendation
string, Zod validation will fail because recommendation is required (line 667). As a
result response?.object is undefined and finalReview returns early (if (!result) return;),
preventing task.summary assignment and the task.finish() logic from executing. Reproduce
by running a complete test and using an LLM reply that contains summary and
scenarioAchieved but omits recommendation.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** src/ai/tester.ts
**Line:** 667:667
**Comment:**
	*Possible Bug: In `finalReview`, the new `recommendation` field is required in the Zod schema but the prompt never asks the model to provide it, so `generateObject` can fail schema validation if the field is omitted, preventing summaries from being processed.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

});

const model = this.provider.getModelForAgent('tester');
Expand Down
Loading