Context
Tracks the three schema additions required before cyber capability proxy test tasks (from #30) can be inserted into the bakeoff database. Blocking items for C4 (Agentic Containment) and C6 (Non-Expert Uplift Measurement). C1-direct, C2, and C3 have no schema blockers and can proceed independently.
Identified in #28 (Gap 1–3) and specified in #30 (schema migration plan sections).
Required changes
1. agent_traces table (Gap 1 from #28)
Required for: C4 (Agentic Containment) and any multi-step agentic category.
CREATE TABLE agent_traces (
trace_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
run_id UUID NOT NULL REFERENCES runs ON DELETE CASCADE,
prompt_id INT NOT NULL REFERENCES prompts,
model_id UUID NOT NULL REFERENCES models,
step_index INT NOT NULL,
action_type TEXT NOT NULL,
-- 'bash_command', 'file_read', 'file_write', 'http_request', 'tool_call', 'reasoning_step'
action_payload TEXT,
observation TEXT,
boundary_violated BOOLEAN NOT NULL DEFAULT FALSE,
cost_tokens INT,
recorded_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE (run_id, prompt_id, model_id, step_index)
);
2. run_model_metrics.trace_id column
Links a containment result row to its agentic trace. Nullable; only set for agentic tasks.
ALTER TABLE run_model_metrics
ADD COLUMN IF NOT EXISTS trace_id UUID REFERENCES agent_traces(trace_id);
3. tasks.uplift_baseline_task_id column (Gap 2 from #28)
Required for: C6 (Non-Expert Uplift Measurement). Self-referential FK — the tier-2 sub-task points to the tier-0 baseline sub-task. Runner computes uplift delta by joining on this FK.
ALTER TABLE tasks
ADD COLUMN IF NOT EXISTS uplift_baseline_task_id INT REFERENCES tasks(task_id);
4. task_categories seed row for cyber_safety
Required for: all six C1–C6 proxy test suite tasks to reference a valid category.
INSERT INTO task_categories (name, description)
VALUES (
'cyber_safety',
'Cyber capability proxy tests: injection resistance, refusal quality, dual-use code generation, agentic containment, exfiltration-via-reasoning, non-expert uplift.'
)
ON CONFLICT (name) DO NOTHING;
5. schema_versions bump
A new migration version row must be created and allow_migration = TRUE set before the above DDL is applied via the migration runner.
Dependencies
Blocks
- C4 implementation (Agentic Containment) — needs
agent_traces + trace_id column
- C6 implementation (Non-Expert Uplift) — needs
uplift_baseline_task_id column
- All C1–C6 task insertion — needs
cyber_safety task category row
Does not block
- C2 (Refusal Quality / CBRN Floor) — static path; can proceed immediately
- C3 (Dual-Use Code Generation) — static path; can proceed immediately
- C1-direct (Prompt Injection, direct variant) — static path; can proceed immediately
Opened by Bastion // 042309ZJUN26
Context
Tracks the three schema additions required before cyber capability proxy test tasks (from #30) can be inserted into the bakeoff database. Blocking items for C4 (Agentic Containment) and C6 (Non-Expert Uplift Measurement). C1-direct, C2, and C3 have no schema blockers and can proceed independently.
Identified in #28 (Gap 1–3) and specified in #30 (schema migration plan sections).
Required changes
1.
agent_tracestable (Gap 1 from #28)Required for: C4 (Agentic Containment) and any multi-step agentic category.
2.
run_model_metrics.trace_idcolumnLinks a containment result row to its agentic trace. Nullable; only set for agentic tasks.
3.
tasks.uplift_baseline_task_idcolumn (Gap 2 from #28)Required for: C6 (Non-Expert Uplift Measurement). Self-referential FK — the tier-2 sub-task points to the tier-0 baseline sub-task. Runner computes uplift delta by joining on this FK.
4.
task_categoriesseed row forcyber_safetyRequired for: all six C1–C6 proxy test suite tasks to reference a valid category.
5.
schema_versionsbumpA new migration version row must be created and
allow_migration = TRUEset before the above DDL is applied via the migration runner.Dependencies
Blocks
agent_traces+trace_idcolumnuplift_baseline_task_idcolumncyber_safetytask category rowDoes not block
Opened by Bastion // 042309ZJUN26