Skip to content

Commit d475fff

Browse files
Add user assertions for correctness checking and allow full subcomponent replacement
1 parent c73e791 commit d475fff

1 file changed

Lines changed: 18 additions & 3 deletions

File tree

AAE.md

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -56,23 +56,38 @@ To set up a new tuning session, work with the user to:
5656
- The **benchmark harness** (read-only) that defines the metric and evaluation procedure.
5757
- Any **configuration or constants** that are fixed.
5858
4. **Identify constraints**: Understand what the agent can and cannot change (see below).
59-
5. **Verify benchmark works**: Run the benchmark once to confirm it produces output.
60-
6. **Initialize results.tsv**: Create `results.tsv` with just the header row.
61-
7. **Confirm and go**: Confirm setup looks good with the user, then begin.
59+
5. **Collect assertions**: Ask the user for any correctness invariants that must hold after every change (see below).
60+
6. **Verify benchmark works**: Run the benchmark once to confirm it produces output.
61+
7. **Initialize results.tsv**: Create `results.tsv` with just the header row.
62+
8. **Confirm and go**: Confirm setup looks good with the user, then begin.
6263

6364
## Constraints
6465

6566
The user defines these per project. The agent must respect them strictly.
6667

6768
**What the agent CAN do:**
6869
- Modify the designated target file(s). Everything within them is fair game: algorithms, data structures, parameters, control flow, memory layout, parallelism, etc.
70+
- Replace entire subcomponents (e.g. swap one algorithm for a fundamentally different one, replace a data structure with an alternative, rewrite a module from scratch) if the agent has reason to believe this will improve the target metric. Optimizations are not limited to incremental tuning; architectural changes and algorithmic replacements are encouraged when the analysis supports them.
6971

7072
**What the agent CANNOT do:**
7173
- Modify the benchmark harness or evaluation code.
7274
- Install new packages or add dependencies beyond what is already available.
7375
- Modify the metric definition or measurement methodology.
7476
- Change the time budget or input data.
7577

78+
## Assertions
79+
80+
The user may define correctness assertions: invariants that must hold after every change the agent makes. These act as a safety net, ensuring that optimizations do not silently break the program's correctness.
81+
82+
Examples of assertions:
83+
- "The output must be a valid partition (every node assigned to exactly one block, no block empty)."
84+
- "The sorted output must be a permutation of the input."
85+
- "The loss must be finite and non-negative after every training step."
86+
87+
During setup, ask the user for any assertions they want enforced. If provided, verify them after every implementation, before running the full benchmark. If an assertion fails, the change is incorrect; discard it immediately (no need to benchmark) and log it as a crash.
88+
89+
Assertions are distinct from the benchmark metric. The metric measures performance; assertions guard correctness. A change that improves the metric but violates an assertion is always discarded.
90+
7691
## The AE Cycle in Practice
7792

7893
Each iteration of the loop implements one full AE cycle:

0 commit comments

Comments
 (0)