Stash is a macOS overlay + assistant that lets users drop files/links/questions into one place, keeps project context indexed in the background, and triggers Codex-powered actions without context switching.
North-star outcome for demo: show a non-technical user create and organize real project outputs (docs/files/tasks/code changes) from one overlay.
- Build window: 5 hours
- Team size: up to 4
- Must be fully open source (backend, frontend, integrations, prompts, scripts)
- Must be new work during hackathon
- Avoid banned categories and policy-risk use cases
- Frontend macOS app (SwiftUI)
- Floating semi-transparent Stash icon overlay
- Drag/drop zone with visual feedback
- Expandable panel: project switcher, file list, quick actions, chat input
- Status feed (indexing, agent runs, task completion)
- Backend background process (Python)
- Local service (FastAPI recommended) on
localhost - Watches project folders, indexes files/links/notes
- Runs Codex CLI tasks in controlled worktrees
- Manages skill usage (indexing skill + file/terminal execution skill)
- Context/Index Layer
- Per-project metadata store (
SQLite) - Embeddings/vector index (local vector DB)
- Unified “project layer” abstraction so users see logical project context, not raw folders
- Agent Orchestration
- GPT plans intent and task sequence
- Codex executes filesystem/code operations
- Tagged command protocol between planner output and executor
flowchart LR
A["User drags files/links/questions into Stash"] --> B["macOS UI"]
B --> C["Python Background Service"]
C --> D["Project Index (SQLite + Vector DB)"]
C --> E["Planner (GPT)"]
E --> F["Tagged Commands"]
F --> G["Codex CLI in Worktree"]
G --> H["Filesystem / Git Changes / Outputs"]
H --> C
C --> B
- Frontend Lead (macOS UI + UX)
- Backend Lead (API + indexing + orchestration)
- Integration Engineer (Codex CLI, worktrees, skill hooks, event streaming)
- Demo/Pitch Engineer (script, sample data, judging-criteria proof, polish)
- Frontend and backend run in parallel after agreeing on API contract.
- Integration engineer unblocks both sides with mocked payloads first, then real wiring.
- Demo engineer can start early with seeded scenario and continuously validate “wow moments.”
Transport: Local HTTP JSON (http://127.0.0.1:8765) + SSE for live events.
- A single project (folder) supports multiple conversations.
- Every conversation stores a full transcript (user, assistant, tool, system events).
- User can re-open any prior conversation and continue from the latest state.
- User can branch/fork a conversation into a new one for exploration.
- User can view project-scoped history across all conversations.
- User can search history by keyword, file/link reference, and time range.
- Runs/tasks and outputs are linked back to the originating conversation/message.
- Project switch restores last active conversation and recent context quickly.
Project:{ id, name, root_path, created_at, last_opened_at, active_conversation_id }Conversation:{ id, project_id, title, status[active|archived], pinned, created_at, last_message_at, summary }Message:{ id, project_id, conversation_id, role[user|assistant|tool|system], content, parts[], parent_message_id, sequence_no, created_at }Run:{ id, project_id, conversation_id, trigger_message_id, status[pending|running|done|failed|cancelled], mode[manual|proactive], output_summary, created_at, finished_at }Asset:{ id, project_id, kind[file|link|note], title, path_or_url, tags[], indexed_at }MessageAttachment:{ id, message_id, asset_id, snippet_ref, created_at }Event:{ id, type, project_id, conversation_id, run_id, ts, payload }
projectstable: project metadata;root_pathis source of truth.conversationstable: many conversations per project.messagestable: append-only message log withsequence_no.runs+run_stepstables: execution trace mapped to messages.assets+asset_chunks+ vector index: retrieval context.eventstable: SSE replay + debugging timeline.
All records include project_id for fast project-level history queries.
Create/open project context (project maps 1:1 to folder root).
List projects for quick switcher.
Get project metadata plus active conversation.
Update project fields (rename, set active_conversation_id).
Create new conversation inside project.
Request
{
"title": "Spec drafting",
"start_mode": "manual"
}Response
{
"id": "conv_001",
"project_id": "proj_123",
"title": "Spec drafting",
"status": "active",
"pinned": false,
"created_at": "2026-02-05T10:00:00Z",
"last_message_at": null
}List conversations in project with pagination, sorting, and filters.
Query params:
cursorlimitstatus=active|archivedpinned=true|falseq=<search text>
Get conversation metadata.
Rename/pin/archive conversation.
Create a new conversation branched from a selected message or latest state.
Request
{
"from_message_id": "msg_104",
"title": "Alternative approach"
}Export transcript as json or markdown.
Append user message and optionally start run.
Request
{
"role": "user",
"content": "Use this project context to draft a launch plan.",
"asset_ids": ["asset_pdf_1", "asset_link_2"],
"mode": "manual",
"start_run": true,
"idempotency_key": "1b91bc95-17da-4f20-a2eb-5ed0c0f8ce1f"
}Response
{
"message_id": "msg_105",
"run_id": "run_201",
"status": "running"
}Load message history (cursor pagination by sequence_no or timestamp).
Edit user message metadata (title/tags) or mark superseded (no destructive delete).
Regenerate assistant response from a specific message context.
Get run status, timing, and step trace.
Cancel in-flight run.
Register dropped file/link/note.
Trigger (or retrigger) indexing for new/changed assets.
Read indexing status and diagnostics.
Project-scoped retrieval query (for proactive suggestions and grounding).
Unified timeline (conversation events + runs + indexing activity).
Search across all conversations and run summaries in a project.
Request
{
"query": "launch plan blockers",
"limit": 20,
"include_archived": true
}SSE stream for UI updates; filterable by conversation_id.
Event types:
conversation_createdconversation_updatedmessage_createdmessage_deltamessage_finalizedrun_startedrun_step_startedrun_step_completedrun_completedrun_failedindexing_startedindexing_progressindexing_completed
Internal endpoint used by orchestrator to run tagged command payloads safely.
- Project is folder-bound: each project has one
root_path; all conversations are scoped to it. - Many conversations per project: no global chat; all chats belong to one project.
- Append-first history: messages/events are append-only; edits create superseding state.
- Soft-delete only: archive conversations instead of hard delete in MVP.
- Ordering guarantee:
sequence_nomonotonic per conversation for deterministic replay. - Resumability: reopening conversation returns last finalized assistant message + pending run state.
- Replay: timeline endpoint can rebuild exact session history for demo and debugging.
Use parseable tags in planner output:
<codex_cmd>
worktree: stash/proj_123
cwd: /Users/<user>/Desktop/Project1
cmd: create file PROJECT_BRIEF.md with sections...
</codex_cmd>
Backend parser extracts blocks, validates against allowlist, executes with Codex CLI, and returns structured results to planner loop.
- Lock MVP scope, success criteria, demo story
- Freeze API contract and event names
- Create repo skeleton and task board
- Seed 1 realistic demo project folder
Frontend:
- Overlay icon, drag/drop, project list, expanded panel shell
- Chat input + activity feed UI with mocked events
Backend:
- FastAPI server scaffold
- Project/conversation/message/run endpoints
- File indexing pipeline + vector insert
Integration:
- Tagged command parser
- Codex execution adapter in worktree
- SSE event broadcaster
Demo/Pitch:
- Script v1 with before/after states
- Screenshots/video capture checkpoints
- Judging criteria mapping document
- Connect UI to live endpoints
- Replace mocks with real events
- Run 2 end-to-end flows and patch failures fast
- Tighten UX transitions and status messaging
- Add guardrails (error states/timeouts/fallbacks)
- Open-source/license/readme cleanup
- Freeze code except critical fixes
- Rehearse 3-5 minute demo twice
- Final pitch narrative + judging callouts
Must
- Overlay drag/drop + project switch
- Multi-conversation support per project (folder-scoped)
- Full conversation history with resume/replay per project
- Background indexing per project
- Task prompt -> Codex execution -> visible result
- One polished end-to-end demo flow
Should
- Proactive suggestions from indexed context
- Link ingestion and lightweight summarization
- Multi-project quick-switch performance
Could
- Fine-grained permission controls
- Advanced ranking for retrieval
- User drops files + links into Stash.
- Backend auto-indexes and shows progress in UI.
- User creates Conversation A and asks: “Create a project brief and organize assets by topic.”
- Planner emits tagged commands; Codex executes in worktree.
- User opens conversation history, resumes Conversation A, and asks a follow-up.
- User forks Conversation A into Conversation B to compare an alternative plan.
- New files appear in project root; UI shows run-linked completion summaries.
- User switches project and repeats quickly (proves multi-project context).
- Impact (25%)
- Position Stash as “developer-enablement OS layer” for non-coders.
- Show practical outcomes: organized files, generated docs, actionable project outputs.
- Codex App (25%)
- Demonstrate real Codex worktree operations and agent-driven edits end-to-end.
- Explicitly narrate planner -> tagged command -> Codex execution loop.
- Creative Use of Skills (25%)
- Show indexing skill + file/terminal skill usage in real workflows.
- Highlight on-the-fly skill generation/adaptation for project needs.
- Demo & Pitch (25%)
- Fast, visual, concrete workflow with clear before/after.
- Keep live demo centered on one compelling user job-to-be-done.
- Risk: API/UI mismatch
Mitigation: Contract freeze in first 30 minutes + mocked payload tests. - Risk: Codex command reliability
Mitigation: Allowlist commands and strict parser; add fallback task mode. - Risk: Conversation history bloat/latency
Mitigation: Cursor pagination, archived threads, summary previews, and background compaction. - Risk: Over-scoping
Mitigation: Must/Should/Could gates and hard demo lock at 4:30.
- Working macOS overlay app
- Working Python background service with API + SSE
- At least one fully reliable end-to-end flow in live demo
- Public repo with license, setup instructions, and architecture notes
stash/
frontend-macos/
backend-service/
shared-contract/
openapi.yaml
events.md
demo-assets/
docs/
architecture.md
pitch-outline.md
- Initialize monorepo and folder structure.
- Create
shared-contract/openapi.yamlwith project + conversation + message + run endpoints. - Define SQLite schema (
projects,conversations,messages,runs,events,assets). - Scaffold FastAPI backend with
/v1/projects,/conversations, and/events/stream. - Scaffold SwiftUI app with overlay window, drop target, and conversation list panel.
- Implement project create/list + conversation create/list + message history pagination.
- Implement asset ingestion, indexing stub, and retrieval search endpoint.
- Implement send message -> run pipeline + tagged command parser stub.
- Wire Codex CLI execution adapter and stream run/message events to UI.
- Rehearse multi-project + multi-conversation end-to-end demo and lock scope.