Skip to content

Commit de19350

Browse files
samrusaniSami Rusani
andauthored
P12-S5: ship task-adaptive briefing (#156)
Co-authored-by: Sami Rusani <sr@samirusani>
1 parent dd77643 commit de19350

28 files changed

Lines changed: 2826 additions & 118 deletions

.ai/handoff/CURRENT_STATE.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -9,26 +9,27 @@
99
- Phase 12 Sprint 1 (`P12-S1`) is shipped.
1010
- Phase 12 Sprint 2 (`P12-S2`) is shipped.
1111
- Phase 12 Sprint 3 (`P12-S3`) is shipped.
12-
- Phase 12 Sprint 4 (`P12-S4`) is the active execution sprint.
12+
- Phase 12 Sprint 4 (`P12-S4`) is shipped.
13+
- Phase 12 Sprint 5 (`P12-S5`) is the active execution sprint.
1314

1415
## Current Baseline Truth
1516
- Alice has typed memory, provenance, trust classes, correction/supersession behavior, open loops, recall, resumption, and explainability.
1617
- Alice exposes CLI, MCP, hosted/product, provider-runtime, and Hermes bridge surfaces.
1718
- The codebase already includes semantic retrieval, embeddings, entities/entity edges, trusted-fact promotion, retrieval evaluation fixtures, deterministic resumption briefs, daily briefs, chief-of-staff briefing flows, and the shipped `P12-S1` hybrid retrieval/reranking foundation with retrieval traces.
18-
- The codebase also includes the shipped `P12-S2` memory mutation candidate and operation foundation.
19+
- The codebase also includes the shipped `P12-S2` memory mutation candidate and operation foundation, the shipped `P12-S3` contradiction/trust foundation, and the shipped `P12-S4` public eval harness.
1920

2021
## Not Yet First-Class In Repo
21-
- task-adaptive brief compiler separated from current briefing surfaces
2222

2323
## Phase Transition Note
2424
- Phase 12 is active.
2525
- `P12-S1` is complete and establishes the retrieval baseline.
2626
- `P12-S2` is complete and establishes the mutation baseline.
2727
- `P12-S3` is complete and establishes the contradiction/trust baseline.
28-
- `P12-S4` is the active sprint and should benchmark shipped retrieval, mutation, and contradiction behavior without reopening those systems.
29-
- The current `P12-S4` branch implements the public eval harness, fixture catalog, and checked-in baseline artifact, pending Control Tower merge approval.
28+
- `P12-S4` is complete and establishes the public-eval baseline.
29+
- `P12-S5` is the active sprint and should build briefing behavior on top of shipped retrieval, mutation, contradiction, and eval baselines without reopening those systems.
30+
- The current `P12-S5` branch implements task-adaptive brief generation, comparison, and model-pack briefing defaults, pending Control Tower merge approval.
3031

3132
## Immediate Control Tower Decisions Needed
32-
- Decide public eval suite taxonomy and baseline artifact format.
33-
- Decide what eval artifacts are committed versus generated locally.
34-
- Decide whether `P12-S4` stays CLI-first or keeps the current branch `/v1/evals/*` API surface.
33+
- Decide briefing modes and payload schema for user recall, resume, worker subtask, and agent handoff.
34+
- Decide provider/model-pack fields for briefing strategy and max brief tokens.
35+
- Decide whether `P12-S5` needs CLI-only, API, and MCP surfaces simultaneously or can stage them.

ARCHITECTURE.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Scope Boundary
44
- **Shipped baseline:** Phases 9-11 and Bridge `B1` through `B4`.
5-
- **Current repo execution posture:** `v0.2.0` is released; `P12-S1`, `P12-S2`, and `P12-S3` are shipped; `P12-S4` is the active sprint.
5+
- **Current repo execution posture:** `v0.2.0` is released; `P12-S1`, `P12-S2`, `P12-S3`, and `P12-S4` are shipped; `P12-S5` is the active sprint.
66
- **Phase 12 delta:** retrieval quality, mutation explicitness, contradiction handling, public evals, and adaptive briefing.
77

88
## Current System Overview
@@ -60,6 +60,7 @@ Alice is a modular continuity platform with shared continuity semantics across l
6060
### Product/Runtime
6161
- `workspaces`, `workspace_members`, `auth_sessions`, `devices`
6262
- `model_providers`, `provider_capabilities`, `model_packs`, `workspace_model_pack_bindings`
63+
- `task_briefs`
6364
- channel, task, trace, approval, and execution tables
6465

6566
## Current Key Flows
@@ -122,15 +123,15 @@ Delivered additions:
122123
Important baseline note: `P12-S3` is now the contradiction/trust baseline for the rest of Phase 12 and should not be reopened except where later sprint integration requires it.
123124

124125
### P12-S4: Public Eval Harness
125-
Expand the current retrieval evaluation foundation into public multi-suite benchmark runs and checked-in baseline reports.
126+
Shipped in `P12-S4`:
126127

127-
Planned additions:
128+
Delivered additions:
128129
- `eval_suites`
129130
- `eval_cases`
130131
- `eval_runs`
131132
- `eval_results`
132133

133-
Important baseline note: `P12-S4` should measure shipped retrieval, mutation, and contradiction behavior rather than redesign those systems.
134+
Important baseline note: `P12-S4` is now the evaluation baseline for the rest of Phase 12 and should not be reopened except where later sprint integration requires it.
134135
Source-of-truth note: the checked-in fixture catalog defines the authoritative suite/case set and ordering; persisted eval suite/case rows are synchronized snapshots for execution and audit, not an independent planning surface.
135136

136137
### P12-S5: Task-Adaptive Briefing
@@ -140,7 +141,7 @@ Planned additions:
140141
- `task_briefs`
141142
- provider/model-pack briefing strategy fields
142143

143-
Important baseline note: Alice already has resumption, daily-brief, and chief-of-staff briefing surfaces. Phase 12 should treat those as starting points, not as greenfield briefing.
144+
Important baseline note: `P12-S5` should build on shipped retrieval, mutation, contradiction, and eval baselines. Existing resumption, daily-brief, and chief-of-staff briefing surfaces are starting points, not greenfield replacements.
144145

145146
## Security And Reliability Rules
146147
- Keep user/workspace isolation intact for continuity, provider, and channel data.

BUILD_REPORT.md

Lines changed: 39 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -1,82 +1,64 @@
11
# BUILD_REPORT
22

33
## Sprint Objective
4-
5-
Implement `P12-S4` public eval harness so Alice can run reproducible local eval suites, persist suite/case/run/result records, emit stable baseline report artifacts, and document what the measured quality surface means.
4+
Implement `P12-S5` task-adaptive briefing so the system can generate deterministic, explainable, role-specific context packs for `user_recall`, `resume`, `worker_subtask`, and `agent_handoff`, while preserving shipped retrieval, mutation, contradiction, trust, and eval behavior.
65

76
## Completed Work
8-
9-
- Added public eval persistence tables for `eval_suites`, `eval_cases`, `eval_runs`, and `eval_results`.
10-
- Added `alicebot_api.public_evals` with:
11-
- fixture-catalog loading
12-
- suite/case syncing into the database
13-
- fixture-backed recall, resumption, correction, contradiction, and open-loop evaluators
14-
- canonical report generation with stable digests
15-
- report writing helper for checked-in baseline artifacts
16-
- Added current-branch public eval API surfaces:
17-
- `GET /v1/evals/suites`
18-
- `POST /v1/evals/runs`
19-
- `GET /v1/evals/runs`
20-
- `GET /v1/evals/runs/{eval_run_id}`
21-
- Made the checked-in fixture catalog authoritative for suite listing and run selection.
22-
- Added pruning for persisted suite/case rows so removed catalog entries do not survive as stale runtime state.
23-
- Added explicit validation for unknown `suite_key` filters instead of silently returning partial or empty runs.
24-
- Added CLI surfaces:
25-
- `alicebot evals suites`
26-
- `alicebot evals run`
27-
- `alicebot evals runs`
28-
- `alicebot evals show`
29-
- Added public fixture definitions in `eval/fixtures/public_eval_suites.json`.
30-
- Added checked-in current-branch baseline report artifact in `eval/baselines/public_eval_harness_v1.json`, with final committed artifact format still pending Control Tower confirmation.
31-
- Added sprint-owned docs in `docs/evals/public_eval_harness.md`, explicitly framed as current branch behavior where API and artifact decisions are still pending.
32-
- Added focused unit and integration coverage for the runner, migration, API, CLI, and baseline reproduction path.
7+
- Added a dedicated task briefing compiler with four briefing modes.
8+
- Added deterministic briefing summaries, selection rules, truncation metadata, token budgeting, and comparison output.
9+
- Added task brief persistence through a new `task_briefs` table.
10+
- Added current-branch API surfaces for task-brief compile, inspect, and compare.
11+
- Added CLI surfaces for task-brief compile, inspect, and compare.
12+
- Added MCP tools for task-brief compile, inspect, and compare.
13+
- Added model-pack briefing defaults through `briefing_strategy` and `briefing_max_tokens`, and task-brief compilation now resolves those defaults when a workspace-selected model pack is available.
14+
- Added focused docs under `docs/briefing/`, explicitly framed as current branch behavior where briefing payload and surface-shape decisions are still pending.
15+
- Added unit and integration coverage for determinism, size reduction, persistence, CLI smoke, MCP smoke, API behavior, migration shape, and model-pack strategy fields.
3316

3417
## Incomplete Work
35-
36-
- None inside the sprint packet scope.
18+
- None within the sprint packet scope.
3719

3820
## Files Changed
39-
40-
- `BUILD_REPORT.md`
41-
- `RULES.md`
21+
- `.ai/handoff/CURRENT_STATE.md`
4222
- `ARCHITECTURE.md`
23+
- `BUILD_REPORT.md`
4324
- `CURRENT_STATE.md`
44-
- `.ai/handoff/CURRENT_STATE.md`
4525
- `PRODUCT_BRIEF.md`
46-
- `ROADMAP.md`
4726
- `REVIEW_REPORT.md`
48-
- `apps/api/alembic/versions/20260414_0060_phase12_public_eval_harness.py`
49-
- `apps/api/src/alicebot_api/cli.py`
27+
- `ROADMAP.md`
28+
- `RULES.md`
29+
- `apps/api/src/alicebot_api/task_briefing.py`
5030
- `apps/api/src/alicebot_api/contracts.py`
51-
- `apps/api/src/alicebot_api/main.py`
52-
- `apps/api/src/alicebot_api/public_evals.py`
5331
- `apps/api/src/alicebot_api/store.py`
54-
- `scripts/check_control_doc_truth.py`
55-
- `docs/evals/public_eval_harness.md`
56-
- `eval/baselines/public_eval_harness_v1.json`
57-
- `eval/fixtures/public_eval_suites.json`
58-
- `tests/integration/test_cli_integration.py`
59-
- `tests/integration/test_public_evals_api.py`
60-
- `tests/unit/test_20260414_0060_phase12_public_eval_harness.py`
32+
- `apps/api/src/alicebot_api/model_packs.py`
33+
- `apps/api/src/alicebot_api/main.py`
34+
- `apps/api/src/alicebot_api/cli.py`
35+
- `apps/api/src/alicebot_api/cli_formatting.py`
36+
- `apps/api/src/alicebot_api/mcp_tools.py`
37+
- `apps/api/alembic/versions/20260414_0061_phase12_task_adaptive_briefing.py`
38+
- `docs/briefing/task-adaptive-briefing.md`
39+
- `tests/unit/test_task_briefing.py`
40+
- `tests/unit/test_model_packs.py`
6141
- `tests/unit/test_cli.py`
62-
- `tests/unit/test_main.py`
63-
- `tests/unit/test_public_evals.py`
42+
- `tests/unit/test_mcp.py`
43+
- `tests/unit/test_20260414_0061_phase12_task_adaptive_briefing.py`
44+
- `tests/integration/test_task_briefing_api.py`
45+
- `tests/integration/test_cli_integration.py`
46+
- `tests/integration/test_mcp_cli_parity.py`
47+
- `tests/integration/test_mcp_server.py`
48+
- `tests/integration/test_phase11_model_packs_api.py`
49+
- `scripts/check_control_doc_truth.py`
6450

6551
## Tests Run
66-
67-
- `./.venv/bin/pytest tests/unit/test_public_evals.py tests/unit/test_20260414_0060_phase12_public_eval_harness.py tests/unit/test_cli.py tests/unit/test_main.py tests/integration/test_public_evals_api.py tests/integration/test_cli_integration.py tests/integration/test_retrieval_evaluation_api.py -q`
68-
- Result: PASS (`83 passed`)
52+
- `./.venv/bin/pytest tests/unit/test_task_briefing.py tests/unit/test_model_packs.py tests/unit/test_cli.py tests/unit/test_mcp.py tests/unit/test_20260414_0061_phase12_task_adaptive_briefing.py tests/unit/test_continuity_resumption.py tests/unit/test_continuity_recall.py tests/unit/test_public_evals.py tests/integration/test_task_briefing_api.py tests/integration/test_cli_integration.py tests/integration/test_mcp_cli_parity.py tests/integration/test_phase11_model_packs_api.py tests/integration/test_mcp_server.py tests/integration/test_public_evals_api.py tests/integration/test_continuity_resumption_api.py tests/integration/test_retrieval_evaluation_api.py -q`
53+
- Result: PASS (`73 passed`)
6954
- `./.venv/bin/python scripts/check_control_doc_truth.py`
7055
- Result: PASS
71-
- `rg -n "/Users|samirusani|Desktop/Codex" RULES.md ARCHITECTURE.md CURRENT_STATE.md .ai/handoff/CURRENT_STATE.md PRODUCT_BRIEF.md ROADMAP.md docs/evals eval/fixtures eval/baselines`
56+
- `rg -n "/Users|samirusani|Desktop/Codex" RULES.md ARCHITECTURE.md CURRENT_STATE.md .ai/handoff/CURRENT_STATE.md PRODUCT_BRIEF.md ROADMAP.md docs/briefing`
7257
- Result: PASS (no matches)
7358

7459
## Blockers/Issues
75-
76-
- No sprint blocker remains.
77-
- The recall suite keeps one non-gating coverage snapshot for entity-edge expansion. It records the current shipped output with `score=0.0` while the suite still passes because the catalog marks that case as observational rather than a strict gate.
78-
- Final product policy is still pending for the Control Tower decisions called out in the sprint packet, including the committed artifact format and whether `/v1/evals/*` remains part of the accepted Phase 12 surface.
60+
- No remaining blockers.
61+
- Final product policy is still pending for the Control Tower decisions called out in the sprint packet, including the canonical persisted briefing payload shape, required model-pack briefing fields, and whether generation and comparison APIs should both ship in `P12-S5`.
7962

8063
## Recommended Next Step
81-
82-
Request Control Tower merge review against the current `P12-S4` branch head.
64+
Request Control Tower merge review against the current `P12-S5` branch head.

CURRENT_STATE.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -12,26 +12,27 @@ Canonical handoff state lives at [.ai/handoff/CURRENT_STATE.md](.ai/handoff/CURR
1212
- Phase 12 Sprint 1 (`P12-S1`) is shipped.
1313
- Phase 12 Sprint 2 (`P12-S2`) is shipped.
1414
- Phase 12 Sprint 3 (`P12-S3`) is shipped.
15-
- Phase 12 Sprint 4 (`P12-S4`) is the active execution sprint.
15+
- Phase 12 Sprint 4 (`P12-S4`) is shipped.
16+
- Phase 12 Sprint 5 (`P12-S5`) is the active execution sprint.
1617

1718
## Current Baseline Truth
1819
- Alice has typed memory, provenance, trust classes, correction/supersession behavior, open loops, recall, resumption, and explainability.
1920
- Alice exposes CLI, MCP, hosted/product, provider-runtime, and Hermes bridge surfaces.
2021
- The codebase already includes semantic retrieval, embeddings, entities/entity edges, trusted-fact promotion, retrieval evaluation fixtures, deterministic resumption briefs, daily briefs, chief-of-staff briefing flows, and the shipped `P12-S1` hybrid retrieval/reranking foundation with retrieval traces.
21-
- The codebase also includes the shipped `P12-S2` memory mutation candidate and operation foundation.
22+
- The codebase also includes the shipped `P12-S2` memory mutation candidate and operation foundation, the shipped `P12-S3` contradiction/trust foundation, and the shipped `P12-S4` public eval harness.
2223

2324
## Not Yet First-Class In Repo
24-
- task-adaptive brief compiler separated from current briefing surfaces
2525

2626
## Phase Transition Note
2727
- Phase 12 is active.
2828
- `P12-S1` is complete and establishes the retrieval baseline.
2929
- `P12-S2` is complete and establishes the mutation baseline.
3030
- `P12-S3` is complete and establishes the contradiction/trust baseline.
31-
- `P12-S4` is the active sprint and should benchmark shipped retrieval, mutation, and contradiction behavior without reopening those systems.
32-
- The current `P12-S4` branch implements the public eval harness, fixture catalog, and checked-in baseline artifact, pending Control Tower merge approval.
31+
- `P12-S4` is complete and establishes the public-eval baseline.
32+
- `P12-S5` is the active sprint and should build briefing behavior on top of shipped retrieval, mutation, contradiction, and eval baselines without reopening those systems.
33+
- The current `P12-S5` branch implements task-adaptive brief generation, comparison, and model-pack briefing defaults, pending Control Tower merge approval.
3334

3435
## Immediate Control Tower Decisions Needed
35-
- Decide public eval suite taxonomy and baseline artifact format.
36-
- Decide what eval artifacts are committed versus generated locally.
37-
- Decide whether `P12-S4` stays CLI-first or keeps the current branch `/v1/evals/*` API surface.
36+
- Decide briefing modes and payload schema for user recall, resume, worker subtask, and agent handoff.
37+
- Decide provider/model-pack fields for briefing strategy and max brief tokens.
38+
- Decide whether `P12-S5` needs CLI-only, API, and MCP surfaces simultaneously or can stage them.

PRODUCT_BRIEF.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,8 @@ Alice is a pre-1.0 continuity platform for AI agents and agent-assisted workflow
1616
- `P12-S1` Hybrid Retrieval + Reranking is shipped.
1717
- `P12-S2` Automated Memory Operations is shipped.
1818
- `P12-S3` Contradiction Detection + Trust Calibration is shipped.
19-
- `P12-S4` Public Eval Harness is the active sprint.
19+
- `P12-S4` Public Eval Harness is shipped.
20+
- `P12-S5` Task-Adaptive Briefing is the active sprint.
2021

2122
## Next Phase
2223
### Phase 12: Retrieval Quality + Adaptive Continuity

0 commit comments

Comments
 (0)