Enhance agent module with draft test 2#2
Closed
haasonsaas wants to merge 1 commit into
Closed
Conversation
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
PR SummaryLow Risk Overview There are no changes to the agent module, compiler crates, or runtime code in this diff—only this documentation/marker file, which aligns with a draft or branch-identification test rather than functional enhancements described in the PR metadata. Reviewed by Cursor Bugbot for commit c994672. Bugbot is set up for automated code reviews on this repo. Configure here. |
haasonsaas
added a commit
that referenced
this pull request
Jun 21, 2026
Four improvements to make the runtime production-grade:
1. Structured output via response_format json_schema (server-enforced).
New schema.rs renders Act types to JSON Schema; the HTTP host passes it
as response_format: { type: "json_schema", strict: true }. The provider
now *guarantees* the shape — no more model guessing field names and
silent coercion drops. Falls back to no-schema on providers that 400.
2. Verifier-based accept gate (second model call, not logprob proxy).
Host::verify() asks a second model to evaluate the candidate output and
returns { confidence, reason }. eval_infer uses this for the accept gate
instead of the token-logprob geometric mean (which measures fluency, not
correctness). OPENAI_VERIFIER_MODEL configures a separate verifier;
defaults to the same model. Mock hosts return 1.0 (no-op gate).
3. Missing GitHub + evalops ops so fix_regression.act can run.
gh: close_pull_request, get_logs (Actions jobs API).
gh.compare now returns the diff/patch text, not just html_url.
gh.create_pull_request takes a base param (was hardcoded "main").
eo: fetch_logs, failing_tests (CI job/step results from Actions API).
4. Retry on transient errors (429/5xx) with exponential backoff.
Cost tracking via OPENAI_COST_PER_1K_TOKENS_MICROS (defaults to
gpt-4o-mini pricing). Consolidated blocking_send() helper.
Also: 5 schema unit tests (record/array/primitive/Result/enum rendering).
Verified end-to-end against real OpenRouter + GitHub:
summarize.act => {"ok":{"text":"Act is a pre-alpha..."}}
open_pr.act => {"ok":"#2"}
(PR opened with model-drafted title/body, then closed + branch deleted.)
59 tests pass, clippy 0, fmt clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR introduces the second draft test for the agent module, enhancing functionality and addressing previous issues.