Project: Autonomous AI Development System
Author: AI-Assisted Design
Date: January 2, 2026
Status: Draft - Ready for Implementation
- Executive Summary
- System Overview
- Architecture
- Components
- GitHub State Machine
- Workflow Lifecycle
- Testing Strategy
- Quality Gates
- CI/CD Integration
- Configuration
- Prompt Engineering
- Error Handling & Rollback
- Security Considerations
- Future Enhancements
- Implementation Roadmap
This system enables fully autonomous AI-driven development where:
- Users create GitHub issues describing tasks, features, or bugs
- A local cron service polls GitHub for new tasks
- OpenCode + oh-my-opencode (Sisyphus) executes the work in isolated git worktrees
- Ralph Loop enables iterative, self-correcting implementation until completion
- Quality gates ensure code meets standards before PR creation
- Completed work is submitted as Pull Requests with full documentation
- Users review and merge PRs from anywhere (including mobile)
The system is stateless by design - GitHub serves as the database, storing all task state via labels, all communication via comments, and all deliverables as PRs.
| Enhancement | Description |
|---|---|
| Ralph Loop Integration | Autonomous iterative development until task completion |
| Quality Gates | Mandatory linting, type-checking, build verification |
| Playwright E2E Testing | Automated browser testing for web features |
| CI/CD Integration | Wait for CI success before PR creation |
| Multi-Cycle State Management | Checkpoint-based continuation for long-running tasks |
| Enhanced Documentation | Automated doc generation and updates |
| Rollback & Debugging | Worktree preservation and recovery strategies |
| Principle | Implementation |
|---|---|
| Stateless Executor | Cron service has no persistent state; GitHub labels are the source of truth |
| Full Transparency | Every action logged via comments + OpenCode session sharing |
| Human-in-the-Loop | PRs always require human review; agents ask questions when uncertain |
| Parallel Execution | Multiple tasks via git worktrees, controlled concurrency |
| Mobile-First UX | GitHub notifications keep users informed anywhere |
| Quality-First | All code must pass quality gates before PR creation |
| Iterative Improvement | Ralph Loop enables self-correcting implementation |
| Component | Technology |
|---|---|
| Task Queue | GitHub Issues + Labels |
| State Management | GitHub Labels + Checkpoint Comments |
| Communication | GitHub Comments |
| Deliverables | GitHub Pull Requests |
| Orchestration | OpenCode + oh-my-opencode (Sisyphus) |
| Iterative Execution | Ralph Loop |
| E2E Testing | Playwright MCP |
| GitHub Integration | GitHub MCP + Personal Access Token |
| Scheduler | Node.js + node-cron |
| Isolation | Git Worktrees |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β GitHub (Cloud) β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββ β
β β Issues β β Labels β β PRs β β CI β β
β β (Tasks) β β (State) β β (Delivery) β β (Quality) β β
β ββββββββ¬βββββββ ββββββββ¬βββββββ ββββββββ²βββββββ βββββββ²ββββββ β
β β β β β β
β ββββββββββββββββββΌβββββββββββββββββ΄ββββββββββββββββ β
β β β
β GitHub API β
ββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββ
β
β HTTPS (PAT Auth)
β
ββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββ
β Local Machine β
β β β
β βββββββββββββββββββββββββΌββββββββββββββββββββββββ β
β β Node-Cron Service β β
β β - Polls GitHub for ai-task issues β β
β β - Manages concurrency (MAX_CONCURRENT_TASKS) β β
β β - Spawns OpenCode orchestrators β β
β β - Monitors task completion & checkpoints β β
β β - Resumes multi-cycle tasks β β
β βββββββββββββββββββββββββ¬ββββββββββββββββββββββββ β
β β β
β βββββββββββββββΌββββββββββββββ β
β β β β β
β βΌ βΌ βΌ β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β OpenCode β β OpenCode β β OpenCode β β
β β Instance 1 β β Instance 2 β β Instance 3 β β
β β β β β β β β
β β Sisyphus + β β Sisyphus + β β Sisyphus + β β
β β Ralph Loop β β Ralph Loop β β Ralph Loop β β
β ββββββββ¬βββββββ ββββββββ¬βββββββ ββββββββ¬βββββββ β
β β β β β
β βΌ βΌ βΌ β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β Worktree β β Worktree β β Worktree β β
β β Issue #42 β β Issue #43 β β Issue #44 β β
β β β β β β β β
β β ai/issue-42 β β ai/issue-43 β β ai/issue-44 β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β
β Project Repository β
β βββββββββββββββββββββββββββββββββββββββββββββββ β
β β main (primary) β β
β β βββ .git/worktrees/ β β
β β βββ issue-42/ β β
β β βββ issue-43/ β β
β β βββ issue-44/ β β
β βββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The central scheduler that polls GitHub and manages OpenCode instances.
Responsibilities:
- Poll GitHub for issues with
ai-tasklabel - Respect
MAX_CONCURRENT_TASKSlimit - Priority ordering (high > medium > low > none)
- Spawn OpenCode processes with appropriate prompts
- Monitor for completion or blocking
- Resume multi-cycle tasks from checkpoints
- Track Ralph Loop execution state
- Handle graceful shutdown
File: src/index.ts
interface CronService {
config: Config;
activeTasks: Map<number, TaskContext>;
ralphLoops: Map<number, RalphLoopContext>;
// Core methods
pollGitHub(): Promise<Issue[]>;
canStartNewTask(): boolean;
startTask(issue: Issue): Promise<void>;
checkBlockedTasks(): Promise<void>;
resumeFromCheckpoint(issue: Issue): Promise<void>;
monitorRalphLoops(): Promise<void>;
}
interface TaskContext {
issueNumber: number;
worktreePath: string;
opencodePid: number;
startedAt: Date;
sessionId: string;
phase: 'analysis' | 'implementation' | 'testing' | 'quality' | 'documentation' | 'pr';
}
interface RalphLoopContext {
issueNumber: number;
sessionId: string;
iterations: number;
status: 'running' | 'completed' | 'failed';
lastUpdate: Date;
}The AI orchestration layer that performs actual development work.
Capabilities via oh-my-opencode:
- Sisyphus: Primary orchestrator (Claude Opus 4.5)
- Oracle: Architecture and debugging advisor (GPT 5.2)
- Librarian: Documentation and OSS research (Claude Sonnet 4.5)
- Explore: Fast codebase navigation (Grok/Gemini)
- Frontend UI/UX Engineer: Visual development (Gemini 3 Pro)
- Background Agents: Parallel async execution
- Todo Continuation: Forces task completion
- Ralph Loop: Autonomous execution until done
- Playwright: Browser automation for E2E testing
Invocation:
opencode run --prompt "$ORCHESTRATOR_PROMPT" --model anthropic/claude-opus-4-5Ralph Loop enables autonomous, iterative development until task completion.
Capabilities:
- Detects
<promise>DONE</promise>signal for completion - Auto-continues if agent stops prematurely
- Ends when complete or max iterations (default 100)
- Configurable max iterations
Usage Patterns:
# Multi-iteration implementation
/ralph-loop "Implement this feature end-to-end. Test thoroughly. Don't stop until <promise>DONE</promise>."
# Quality gate enforcement
/ralph-loop "Fix lint/type/build errors. Re-run until all gates pass."
# Test execution and fixing
/ralph-loop "Run full test suite. Fix all failures. End with <promise>DONE</promise> when all pass."State Tracking:
async function monitorRalphLoops(): Promise<void> {
for (const [issueNumber, context] of this.ralphLoops) {
const elapsed = Date.now() - context.lastUpdate.getTime();
// If running >1 hour with no update, investigate
if (elapsed > 3600000 && context.status === 'running') {
logger.warn(`Ralph Loop for issue #${issueNumber} may be stuck`);
// Post status comment or take other action
}
}
}Browser automation for comprehensive end-to-end testing.
Test Coverage Requirements:
- User flows: Login, create, update, delete
- Edge cases: Empty states, errors, invalid inputs
- Visual regression: Screenshots of key states
Test File Location: tests/e2e/feature-name.spec.ts
Run Command: bunx playwright test
All GitHub operations performed via GitHub MCP (Model Context Protocol).
Operations:
- Read issue details and comments
- Add/remove labels
- Post comments (progress updates, questions, completion)
- Create branches
- Create pull requests
- Request reviews
- Check CI status
Authentication: Personal Access Token via environment variable.
Isolated working directories for parallel task execution.
Structure:
/path/to/project/ # Main repo
βββ .git/
βββ src/
βββ package.json
βββ ...
/path/to/project/.worktrees/ # Worktree root
βββ issue-42/ # Branch: ai/issue-42-fix-auth
β βββ src/
β βββ ...
βββ issue-43/ # Branch: ai/issue-43-add-feature
β βββ src/
β βββ ...
βββ issue-44/ # Branch: ai/issue-44-refactor
βββ src/
βββ ...
Worktree Retention Policy:
- Blocked tasks: Keep worktree for continuation
- Failed tasks: Keep worktree for debugging (minimum 7 days)
- Completed tasks: Keep for reference (configurable retention)
| Label | Color | Description |
|---|---|---|
ai-task |
π’ Green | Ready for AI pickup |
ai-priority:high |
π΄ Red | High priority |
ai-priority:medium |
π‘ Yellow | Medium priority |
ai-priority:low |
π΅ Blue | Low priority (default) |
ai-in-progress |
π Orange | Currently being worked on |
ai-blocked |
π£ Purple | Waiting for human clarification |
ai-review-ready |
π©΅ Cyan | PR created, awaiting review |
ai-debugging |
πΆ Amber | Task failed, needs manual debugging |
βββββββββββββββββββββββββββββββββββββββ
β β
βΌ β
ββββββββββββ βββββββββββββ βββββββββββββββββββ β
β (new) βββββΆβ ai-task βββββΆβ ai-in-progress β β
β issue β β β β β β
ββββββββββββ βββββββββββββ ββββββββββ¬βββββββββ β
β β² β β
β β βΌ β
β β ββββββββββββββββββ β
β β β ai-blocked ββββββββ
β β β (needs input) β (human replies)
β β ββββββββββββββββββ
β β β
β β β (continues)
β β βΌ
β β ββββββββββββββββββ
β β β ai-review-readyβ
β β β (PR created) β
β β βββββββββ¬βββββββββ
β β β
β β βΌ
β β ββββββββββββββββββ
β βββββββββββββ (closed) β
β (new task) β PR merged β
β ββββββββββββββββββ
β
ββββΆ (user can also add ai-task to existing issues)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Cron Tick (every POLL_INTERVAL_MS) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 1. Query GitHub: issues with label:ai-task -label:ai-in-progressβ
β 2. Sort by priority (high > medium > low > none) β
β 3. Check: activeTasks.size < MAX_CONCURRENT_TASKS ? β
β 4. If yes, pick top issue and start task β
β 5. Check ai-blocked issues for new replies β
β 6. Check ai-in-progress issues for checkpoint resumption β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Starting Task for Issue #42 β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 1. Remove ai-task label β
β 2. Add ai-in-progress label β
β 3. Post comment: "π€ Starting work on this issue..." β
β 4. Create git worktree: .worktrees/issue-42 β
β 5. Create branch: ai/issue-42-{slug} β
β 6. Spawn opencode process with orchestrator prompt β
β 7. Track in activeTasks map β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β OpenCode Orchestrator Execution β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β **Phase 3.1: Analysis** β
β 1. Read full issue context (description, comments) β
β 2. Analyze requirements β
β 3. Identify dependencies and edge cases β
β 4. Post first progress comment: "π Analysis complete..." β
β β
β **Phase 3.2: Exploration** β
β 1. Explore codebase for relevant patterns β
β 2. Find similar features using Librarian β
β 3. Check for existing tests β
β 4. Document findings β
β β
β **Phase 3.3: Planning** β
β 1. Create implementation plan (if complex, consult Oracle) β
β 2. Break into sub-phases if needed β
β 3. Identify testing strategy β
β 4. Post checkpoint with plan β
β β
β **Phase 3.4: Implementation (Ralph Loop)** β
β 1. /ralph-loop "Implement feature following plan." β
β β Iterate until complete or max iterations β
β β Auto-fix issues discovered during implementation β
β β Use specialists (@frontend-ui-ux-engineer, @librarian) β
β 2. Post progress updates every 30 minutes β
β 3. Post checkpoint at each phase completion β
β β
β **Phase 3.5: Testing (Ralph Loop + Playwright)** β
β 1. If tests exist: β
β β /ralph-loop "Run tests. Fix failures." β
β 2. If no tests exist: β
β β Generate Playwright E2E tests β
β β /ralph-loop "Generate and run tests. Fix failures." β
β 3. Only proceed when tests pass β
β β
β **Phase 3.6: Quality Gates** β
β 1. Run linting β
β 2. Run type checking β
β 3. Build project β
β 4. If any gate fails: β
β β /ralph-loop "Fix quality issues." β
β 5. Only proceed when all gates pass β
β β
β **Phase 3.7: CI Verification** β
β 1. Push to GitHub β
β 2. Wait for CI to complete (max CI_WAIT_TIMEOUT minutes) β
β 3. If CI fails β /ralph-loop to fix, wait for re-run β
β 4. If CI passes β Continue to documentation phase β
β β
β **Phase 3.8: Documentation** β
β 1. @document-writer: Generate comprehensive docs β
β 2. Update README.md if user-facing β
β 3. Generate API docs if applicable β
β 4. Document migrations/changes β
β β
β **Phase 3.9: PR Creation** β
β 1. Create PR with comprehensive description β
β 2. Include: test results, coverage, docs link, session link β
β 3. Update labels to ai-review-ready β
β 4. Post completion comment β
β 5. Share session β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Agent Encounters Uncertainty β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 1. Add ai-blocked label (keep ai-in-progress) β
β 2. Post comment with specific question(s) β
β 3. Include context: what was tried, what's unclear β
β 4. Post checkpoint for current state β
β 5. OpenCode process exits gracefully β
β 6. Task remains in activeTasks as "blocked" β
β β
β [Later: Human replies on GitHub] β
β β
β 7. Next cron cycle detects new comment after ai-blocked β
β 8. Remove ai-blocked label β
β 9. Re-spawn opencode with continuation prompt β
β 10. Agent reads new comment + checkpoint, continues work β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Task Completed Successfully β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 1. All quality gates passed β
β 2. CI status is 'success' β
β 3. PR created and linked to issue β
β 4. Remove ai-in-progress label β
β 5. Add ai-review-ready label β
β 6. Post completion comment with: β
β - Summary of changes β
β - Test results and coverage β
β - PR link β
β - Session share link β
β 7. Remove from activeTasks β
β 8. Worktree retained per WORKTREE_RETENTION_DAYS β
β β
β [Human reviews PR, merges or requests changes] β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Use Ralph Loop strategically for test execution and fixing:
### Testing Phase (CRITICAL - Must Complete Before PR)
**Strategy:**
1. If tests exist β Run them via Ralph Loop
2. If tests fail β Fix and retry until pass
3. If no tests exist β Generate Playwright E2E tests
4. Run generated tests β Fix failures
5. Only create PR when tests pass
**Commands:**
/ralph-loop "Ensure all existing tests pass. Fix any failures."
**OR if no tests:**
/ralph-loop "Generate comprehensive Playwright E2E tests for this feature. Run tests and fix all failures. End with <promise>DONE</promise>"
**CRITICAL RULE:** Do NOT create PR until Ralph Loop completes successfully with <promise>DONE</promise> signal.For web features, comprehensive E2E testing is required:
### E2E Testing with Playwright
**Test Coverage Requirements:**
- User flows: Login, create, update, delete
- Edge cases: Empty states, errors, invalid inputs
- Visual regression: Screenshots of key states
**Playwright Test Generation Pattern:**
1. Use @frontend-ui-ux-engineer to identify test scenarios
2. Generate Playwright tests covering:
- Happy path
- Error handling
- Validation
3. Run tests locally
4. If tests fail β Fix and retry via Ralph Loop
5. Create PR only when tests pass
**Test File Location:** tests/e2e/feature-name.spec.ts
**Run Command:** bunx playwright testWhen tests don't exist:
### When Tests Don't Exist
1. Analyze the feature requirements
2. Ask Librarian: "Find test patterns in this codebase"
3. Generate tests using Playwright (for E2E) or framework used by project
4. Ensure tests cover:
- Core functionality
- Error cases
- Edge cases
5. Run and fix until passing
**Example Test Generation Prompt:**
Generate comprehensive tests for the implemented feature. Use the existing testing framework (found in tests/ directory or package.json devDependencies).
Tests must include:
- Unit tests for individual functions
- Integration tests for API endpoints
- E2E tests for user-facing features (use Playwright for web features)### Test Failure Handling
**When tests fail:**
1. Analyze failure root cause
2. Use Oracle: "Debug this test failure and propose fix"
3. Implement fix
4. Re-run tests
5. Repeat until all tests pass
**When tests pass:**
1. Run full test suite to ensure no regressions
2. Check coverage if supported
3. Document test coverage in PR description
**When tests are flaky:**
1. Add .skip annotation with reason
2. Create follow-up issue to fix flaky test
3. Continue with rest of test suiteBefore creating a PR, you MUST verify:
1. Linting:
npm run lint
# or bun run lint / pnpm lint / make lint- Fix all lint errors
- Fix all lint warnings OR justify why warnings are acceptable
2. Type Checking:
npm run type-check
# or tsc --noEmit- Zero type errors allowed
- Use Oracle to resolve complex type issues if needed
3. Building:
npm run build
# or bun run build / make build- Build must succeed
- Analyze and fix any build warnings
4. Formatting:
npm run format
# or bunx prettier --write "**/*"- Code must be properly formatted
- Commit formatting changes separately if needed
5. Existing Tests:
npm test
# or bun test / pytest / cargo test- All tests must pass
- No test skips without justification
CRITICAL RULE: Do NOT create PR until ALL quality gates pass. Use Ralph Loop if gates fail repeatedly.
interface QualityGate {
name: string;
command: string[];
required: boolean;
}
const GATES: QualityGate[] = [
{ name: "Linting", command: ["npm", "run", "lint"], required: true },
{ name: "Type Checking", command: ["npx", "tsc", "--noEmit"], required: true },
{ name: "Build", command: ["npm", "run", "build"], required: true },
{ name: "Tests", command: ["npm", "test"], required: true }
];
async function runQualityGates(worktreePath: string): Promise<boolean> {
for (const gate of GATES) {
const result = execSync(gate.command, { cwd: worktreePath });
if (result.status !== 0) {
logger.error(`${gate.name} failed`, result.stderr);
if (gate.required) {
throw new Error(`Required quality gate ${gate.name} failed`);
}
}
}
return true;
}### When Quality Gates Fail
**Automatic Retry (if enabled):**
1. Add comment: "β οΈ Quality gate failed: ${gateName}. Attempting fix..."
2. Use Ralph Loop: /ralph-loop "Fix ${gateName} errors. Run until gates pass."
3. Re-run quality gates
4. If max attempts reached β Block task and ask for guidance
**Manual Block:**
1. Add ai-blocked label
2. Post comment with failure details
3. Wait for human guidance.github/workflows/test.yml:
name: Test Suite
on:
pull_request:
types: [opened, synchronize, reopened]
push:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install dependencies
run: bun install
- name: Run linting
run: bun run lint
- name: Type check
run: bun run type-check
- name: Run tests
run: bun test --coverage
- name: Build
run: bun run build
- name: E2E Tests
run: bunx playwright test
env:
CI: true### Interpreting CI Results
**Before Creating PR:**
1. Push branch to GitHub
2. Wait for CI to complete (max CI_WAIT_TIMEOUT_MINUTES)
3. If CI fails:
β Use Ralph Loop: /ralph-loop "Fix CI failures. Re-run tests. Don't create PR until CI passes."
4. If CI passes:
- Document CI results in PR description
- Include test coverage report
- Continue to PR creation
**CI as Quality Gate:**
CRITICAL RULE: Do NOT create PR until:
1. CI status is 'success'
2. All quality gates pass
3. Tests passasync function waitForCI(branchName: string): Promise<boolean> {
const maxWait = this.config.ciWaitTimeoutMinutes * 60 * 1000;
for (let i = 0; i < maxWait; i += 30000) {
const checks = await github.rest.checks.listForRef({
owner: this.config.githubRepo.owner,
repo: this.config.githubRepo.repo,
ref: branchName
});
const latestCheck = checks.data.check_runs[0];
if (latestCheck?.conclusion === 'success') {
return true;
} else if (latestCheck?.conclusion === 'failure') {
return false;
}
// Still running
await sleep(30000);
}
throw new Error('CI timeout');
}Create a .env file in the project root:
# ============================================================
# GitHub Configuration
# ============================================================
# Personal Access Token with repo scope
GITHUB_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
# Repository to monitor (owner/repo format)
GITHUB_REPO=your-username/your-repo
# ============================================================
# Scheduler Configuration
# ============================================================
# How often to poll GitHub (milliseconds)
POLL_INTERVAL_MS=300000
# Maximum concurrent tasks
MAX_CONCURRENT_TASKS=2
# ============================================================
# OpenCode Configuration
# ============================================================
# Path to project repository (main worktree)
PROJECT_PATH=/path/to/your/project
# Worktree directory (relative to PROJECT_PATH)
WORKTREE_DIR=.worktrees
# OpenCode model for orchestration
OPENCODE_MODEL=anthropic/claude-opus-4-5
# ============================================================
# Testing Strategy
# ============================================================
# Enable Ralph Loop for autonomous testing
RALPH_LOOP_ENABLED=true
RALPH_LOOP_MAX_ITERATIONS=100
# Enable Playwright E2E testing
PLAYWRIGHT_E2E_ENABLED=true
# Test coverage requirements (percentage)
MIN_TEST_COVERAGE=80
# ============================================================
# Quality Gates
# ============================================================
# Quality gate enforcement
ENFORCE_QUALITY_GATES=true
# Max quality gate fix attempts
MAX_QUALITY_ATTEMPTS=3
# ============================================================
# Task State Management
# ============================================================
# Checkpoint posting interval (minutes)
CHECKPOINT_INTERVAL_MINUTES=30
# Worktree retention days
WORKTREE_RETENTION_DAYS=7
# Auto-clean worktrees older than retention
AUTO_CLEAN_WORKTREES=false
# ============================================================
# CI/CD Integration
# ============================================================
# Wait for CI before PR (minutes)
CI_WAIT_TIMEOUT_MINUTES=10
# CI status as quality gate
REQUIRE_CI_PASS=true
# ============================================================
# Documentation
# ============================================================
# Auto-generate documentation
AUTO_GENERATE_DOCS=true
# Auto-update README
AUTO_UPDATE_README=true
# Documentation framework type
DOC_FRAMEWORK=typedoc
# ============================================================
# LLM Provider API Keys
# ============================================================
# Anthropic (Claude) - Required for Sisyphus
ANTHROPIC_API_KEY=sk-ant-xxxxxxxxxxxxxxxxxxxxxxxxxxxx
# OpenAI (ChatGPT) - Required for Oracle
OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
# Google (Gemini) - Optional
GOOGLE_API_KEY=AIzaxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
# ============================================================
# Logging & Debugging
# ============================================================
# Log level: debug, info, warn, error
LOG_LEVEL=info
# Share OpenCode sessions publicly
SHARE_SESSIONS=true
# Progress update frequency (minutes)
PROGRESS_UPDATE_INTERVAL_MINUTES=10| Variable | Default | Description |
|---|---|---|
RALPH_LOOP_ENABLED |
true |
Enable Ralph Loop for autonomous testing |
RALPH_LOOP_MAX_ITERATIONS |
100 |
Max iterations before declaring failure |
PLAYWRIGHT_E2E_ENABLED |
true |
Use Playwright for E2E testing |
ENFORCE_QUALITY_GATES |
true |
Block PR creation until gates pass |
MAX_QUALITY_ATTEMPTS |
3 |
Max quality gate fix attempts |
MIN_TEST_COVERAGE |
80 |
Minimum test coverage percentage |
| Variable | Default | Description |
|---|---|---|
CHECKPOINT_INTERVAL_MINUTES |
30 |
Post checkpoint every X minutes |
WORKTREE_RETENTION_DAYS |
7 |
Keep worktrees for X days after completion |
AUTO_CLEAN_WORKTREES |
false |
Automatically clean old worktrees |
| Variable | Default | Description |
|---|---|---|
REQUIRE_CI_PASS |
true |
Wait for CI success before PR |
CI_WAIT_TIMEOUT_MINUTES |
10 |
Max minutes to wait for CI |
This prompt is sent to OpenCode when starting a task:
# GitHub Task Orchestrator - Issue #{issueNumber}
You are an autonomous AI developer working on a GitHub issue. Your task is to fully implement the requested changes, run quality gates, and create a PR.
## Context
**Repository:** {owner}/{repo}
**Issue:** #{issueNumber}
**Title:** {issueTitle}
**Branch:** ai/issue-{issueNumber}-{slug}
**Worktree:** {worktreePath}
## Issue Description
{issueBody}
## Previous Comments
{issueComments}
## Your Mission
### Phase 1: Analysis & Planning
1. **Understand** - Analyze the issue thoroughly. If anything is unclear, you MUST ask for clarification by posting a comment and adding the `ai-blocked` label.
2. **Explore** - Use the codebase exploration tools to understand existing patterns, conventions, and related code.
3. **Plan** - Create a clear implementation plan. For complex tasks, consult Oracle for architecture guidance.
4. Post checkpoint comment with your plan.
### Phase 2: Implementation (Use Ralph Loop)
5. **Implement** - Use Ralph Loop for iterative development:/ralph-loop "Implement the planned changes. Test as you go. Don't stop until DONE."
- Write clean, well-documented code following existing project conventions
- Delegate to specialists when appropriate:
- Frontend/UI work β @frontend-ui-ux-engineer
- Documentation β @document-writer
- Research β @librarian
### Phase 3: Testing (CRITICAL)
6. **Test** - Run existing tests. If no tests exist, generate them:
/ralph-loop "Run all tests. If tests fail, fix them. If no tests exist, generate comprehensive tests. End with DONE when all pass."
- For web features, use Playwright for E2E testing
- Only proceed when ALL tests pass
### Phase 4: Quality Gates (REQUIRED)
7. **Quality Gates** - Run and pass ALL quality gates:
- Linting: `npm run lint`
- Type checking: `tsc --noEmit`
- Build: `npm run build`
- If any fail, use Ralph Loop to fix:
/ralph-loop "Fix all quality gate failures. Re-run until all pass."
### Phase 5: CI & PR
8. **Push & Wait for CI** - Push your branch and wait for CI to pass
9. **Create PR** - Only after CI passes, create a pull request with:
- Clear title referencing the issue
- Description of changes
- Test results and coverage
- Screenshots if UI changes
10. **Report** - Post a completion comment on the issue with:
- Summary of what was done
- Link to PR
- Test results
- Session share link for full transparency
## Ralph Loop Integration
You have access to Ralph Loop - use it strategically for:
**Multi-Iteration Work:**
- Complex implementations
- Test-fix cycles
- Quality gate enforcement
**Parameters:**
- Max iterations: 100
- Completion signal: `<promise>DONE</promise>`
- Auto-continue: true
## GitHub Operations
Use the GitHub MCP to:
- Read issue and comment details
- Add/remove labels (ai-in-progress, ai-blocked, ai-review-ready)
- Post comments for progress updates and questions
- Create pull requests
- Check CI status
## Critical Rules
1. **NEVER merge PRs** - Only create them. Humans review and merge.
2. **NEVER push to main** - Only work on your feature branch.
3. **ASK when uncertain** - Better to block and ask than to implement incorrectly.
4. **QUALITY FIRST** - Do NOT create PR until all quality gates pass.
5. **CI MUST PASS** - Wait for CI success before creating PR.
6. **TEST EVERYTHING** - Use Ralph Loop to ensure tests pass.
7. **Document everything** - Your work should be self-explanatory.
## Labels Reference
- `ai-in-progress`: Currently being worked on (already applied)
- `ai-blocked`: Need clarification (add this + post comment with question)
- `ai-review-ready`: PR created (apply when done)
## Begin
Start by reading the issue carefully and exploring the codebase. Create a todo list for all the work needed, then execute methodically using Ralph Loop for implementation phases.
Posted periodically during long-running tasks:
## π Task Checkpoint
**Started:** {startTime}
**Last Update:** {timestamp}
**Elapsed:** {elapsedTime}
### β
Completed
- [x] {completedTodo1}
- [x] {completedTodo2}
- [x] {completedTodo3}
### π In Progress
- [ ] {currentTodo}
### π Remaining
- [ ] {remainingTodo1}
- [ ] {remainingTodo2}
### Current Status
{statusDescription}
### Session ID
{sessionId}
---
_Next cron cycle will continue from this checkpoint._## π Progress Update
**Time:** {elapsed} since start | {timestamp}
### β
Completed (Last Update)
{completedTodos}
### π Currently Working On
**Task:** {currentTodo}
**Approach:** {approachDescription}
### π Remaining
{remainingTodos}
### π‘ Insights
{interestingDiscoveriesOrPatterns}
---
_Last updated: {timestamp}_ | Session: [{sessionId}]({shareLink})## β
Work Completed
I've finished implementing the requested changes.
### Summary
{changeSummary}
### Pull Request
π #{prNumber} - {prTitle}
### Changes Made
{changesList}
### Quality Gates
- β
Linting: Passed
- β
Type Checking: Passed
- β
Build: Passed
- β
Tests: {testCount} passing ({coverage}% coverage)
- β
CI: Passed
### Testing
{testingNotes}
### Full Session Log
π [View complete AI session]({sessionShareUrl})
---
_Please review the PR and merge when ready. Feel free to request changes or ask questions._## π€ Clarification Needed
I've started working on this issue but need some clarification before proceeding.
### Context
{whatWasAttempted}
### Questions
{specificQuestions}
### What I'm Considering
{options}
### Current Progress
π See checkpoint above for current state.
---
_Please reply to this comment with the needed information. I'll continue once you respond._| Category | Example | Handling |
|---|---|---|
| GitHub API | Rate limit, auth failure | Exponential backoff, notify user |
| OpenCode | Process crash, timeout | Retry once, then block with error comment |
| Git | Merge conflict, push rejected | Post error, add ai-blocked label |
| LLM | Context overflow, API error | Session recovery (oh-my-opencode handles) |
| Quality Gates | Lint/type/build failures | Ralph Loop auto-fix, then block if persistent |
| CI | Test failures, build errors | Ralph Loop auto-fix, then block if persistent |
## β οΈ Error Encountered
I encountered an error while working on this issue.
### Error Type
{errorType}
### Details{errorDetails}
### What Was Attempted
{attemptedAction}
### Worktree Location
`.worktrees/issue-{number}`
### Session Log
π [View session for debugging]({sessionShareUrl})
---
_The `ai-blocked` label has been added. Please investigate and reply with guidance, or remove the `ai-task` label to cancel._
### Worktree Preservation Rules
**NEVER immediately clean up worktree on failure.**
1. If task is blocked β Keep worktree for later continuation
2. If task fails critically β Keep worktree for manual debugging
3. If task succeeds β Keep worktree for reference (per WORKTREE_RETENTION_DAYS)Scenario 1: User rejects PR entirely
1. Add comment acknowledging rejection
2. Keep worktree intact for manual review
3. Wait for further instructions
Worktree location: `.worktrees/issue-{number}`Scenario 2: Need to restart task
cd .worktrees/issue-42
git reset --soft HEAD~1 # Undo last commit but keep changes
# Continue from checkpointWhen task fails and needs manual debugging:
## π Debugging Mode
I've encountered issues I can't automatically resolve.
### What Happened
{errorDetails}
### Worktree Location
`.worktrees/issue-{number}`
### What You Can Do
**Option 1: Debug in worktree**
```bash
cd .worktrees/issue-42
git status
# Make manual changes and testOption 2: Provide guidance Reply to this comment with specific instructions.
Option 3: Cancel task
Remove ai-task label and I'll abandon this work.
π View full session
### Graceful Degradation
1. **Single task fails:** Mark as blocked, continue other tasks
2. **GitHub API unavailable:** Pause polling, retry with backoff
3. **All LLM providers down:** Pause all tasks, alert user
4. **Cron process crash:** State preserved on GitHub; restart resumes from checkpoints
---
## Security Considerations
### Token Security
| Risk | Mitigation |
|------|------------|
| Token in logs | Never log full token; mask in output |
| Token in comments | Never include tokens in GitHub comments |
| Token exposure | Store in `.env`, add to `.gitignore` |
| Excessive scope | Use fine-grained PAT with minimal permissions |
### Required PAT Permissions
For fine-grained Personal Access Token:
Repository permissions:
- Contents: Read and write (for pushing branches)
- Issues: Read and write (for labels/comments)
- Pull requests: Read and write (for creating PRs)
- Checks: Read (for CI status)
- Metadata: Read (required)
### Code Execution Safety
| Risk | Mitigation |
|------|------------|
| Malicious issue content | Agent runs in worktree, not main |
| Destructive commands | PRs require human review before merge |
| Secrets in code | Agent instructed never to commit secrets |
| Infinite loops | Ralph Loop has max iterations, oh-my-opencode has task timeout |
### Worktree Isolation
- Each task operates in isolated worktree
- Changes never affect main directly
- Failed tasks can be discarded entirely
- No cross-task interference
---
## Future Enhancements
### Phase 2: Bot Identity
Replace PAT with GitHub App for `[bot]` identity:
1. Create custom GitHub App
2. Implement token refresh in cron service
3. All actions appear as `your-app[bot]`
### Phase 3: Multi-Repository
Extend to multiple repositories:
```javascript
{
repositories: [
{ owner: "myorg", repo: "frontend" },
{ owner: "myorg", repo: "backend" },
{ owner: "myorg", repo: "shared-lib" }
]
}
- Priority decay (old low-priority tasks escalate)
- Time-based scheduling (don't run during work hours)
- Resource-aware scaling (adjust concurrency based on load)
- Task completion rates
- Average time per task
- Token usage tracking
- Quality gate pass rates
- Test coverage trends
- Web dashboard for monitoring
Handle PR review feedback:
- Watch for change request comments on
ai-review-readyPRs - Resume work to address feedback
- Push updates to same branch
- Wait for CI, re-submit for review
| Component | Priority | Complexity |
|---|---|---|
| Node-cron scheduler | P0 | Medium |
| GitHub polling | P0 | Low |
| Label state management | P0 | Low |
| Worktree management | P0 | Medium |
| OpenCode spawning | P0 | Medium |
| Orchestrator prompt | P0 | Medium |
| Ralph Loop integration | P0 | Medium |
| Quality gates | P0 | Medium |
| Playwright E2E testing | P1 | High |
| CI/CD integration | P1 | Medium |
| Progress comments | P1 | Low |
| Session sharing | P1 | Low |
| Blocked task detection | P1 | Medium |
| Continuation flow | P1 | Medium |
| Checkpoint management | P1 | Medium |
| Error handling | P1 | Medium |
| Worktree preservation | P2 | Low |
| Documentation automation | P2 | Medium |
opencode-github-orchestrator/
βββ src/
β βββ index.ts # Entry point, cron setup
β βββ config.ts # Environment loading
β βββ github/
β β βββ client.ts # GitHub API wrapper
β β βββ issues.ts # Issue queries
β β βββ labels.ts # Label management
β β βββ comments.ts # Comment templates
β β βββ ci.ts # CI status checking
β βββ tasks/
β β βββ manager.ts # Task lifecycle
β β βββ worktree.ts # Git worktree ops
β β βββ opencode.ts # OpenCode process management
β β βββ quality.ts # Quality gate runner
β β βββ checkpoint.ts # Checkpoint management
β β βββ ralph-loop.ts # Ralph Loop tracking
β βββ prompts/
β β βββ orchestrator.ts # Main prompt template
β β βββ continuation.ts # Unblock continuation
β β βββ templates.ts # Comment templates
β βββ utils/
β βββ logger.ts # Logging utilities
β βββ slug.ts # Title-to-slug conversion
βββ .env.example # Environment template
βββ package.json
βββ tsconfig.json
βββ README.md
Find tasks to process:
query FindAiTasks($owner: String!, $repo: String!) {
repository(owner: $owner, name: $repo) {
issues(
first: 10
states: OPEN
labels: ["ai-task"]
orderBy: {field: CREATED_AT, direction: ASC}
) {
nodes {
number
title
body
labels(first: 10) {
nodes { name }
}
comments(last: 10) {
nodes {
body
createdAt
author { login }
}
}
}
}
}
}Check for blocked task replies:
query CheckBlockedTasks($owner: String!, $repo: String!) {
repository(owner: $owner, name: $repo) {
issues(
first: 10
states: OPEN
labels: ["ai-blocked"]
) {
nodes {
number
comments(last: 5) {
nodes {
createdAt
author { login }
}
}
}
}
}
}# Create worktree with new branch from latest main
git fetch origin main
git worktree add .worktrees/issue-42 -b ai/issue-42-slug origin/main
# Work in worktree
cd .worktrees/issue-42
# ... make changes ...
git add .
git commit -m "feat: implement feature X"
git push -u origin ai/issue-42-slug
# List all worktrees
git worktree list
# Clean up (after PR merged, respecting retention policy)
git worktree remove .worktrees/issue-42
git branch -d ai/issue-42-slug# Run with message (headless)
opencode run "Your prompt here"
# Continue session
opencode run --session <session-id> "Continue message"
# With specific model
opencode run --model anthropic/claude-opus-4-5 "Message"
# Share session
# (Done via /share command within session)# Start Ralph Loop for implementation
/ralph-loop "Implement feature X. Test thoroughly. End with <promise>DONE</promise>."
# Ralph Loop for quality fixes
/ralph-loop "Fix all lint and type errors. Re-run until all pass."
# Ralph Loop for testing
/ralph-loop "Run all tests. Fix failures. Generate missing tests. End with <promise>DONE</promise>."| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0 | 2026-01-02 | AI-Assisted | Initial specification |
| 2.0 | 2026-01-02 | AI-Assisted | Added Ralph Loop, quality gates, Playwright E2E, CI/CD integration, checkpoint management, rollback/debugging |
End of Technical Specification v2.0