|
| 1 | +# agentgui Comprehensive Testing Plan |
| 2 | + |
| 3 | +## STATUS: RESOLVED - SERVER RUNNING |
| 4 | +**✅ Fixed:** Using `@anthropic-ai/claude-code` SDK directly |
| 5 | +- Package: `@anthropic-ai/claude-code@^1.0.128` |
| 6 | +- API: `query({ prompt, options })` for streaming responses |
| 7 | +- acp-launcher.js: Refactored to use query() API |
| 8 | +- Server Status: Running on http://localhost:3000/gm/ |
| 9 | +- UI Status: Fully loaded, connected, showing chat history |
| 10 | + |
| 11 | +--- |
| 12 | + |
| 13 | +## NEW REQUIREMENT: Import OpenCode Conversations |
| 14 | +**User Request:** "we must also import all the opencode conversations from the system and keep them in sync too" |
| 15 | + |
| 16 | +**Implementation Plan:** |
| 17 | + |
| 18 | +### Phase 1: Claude Code History Import (PRIORITY 1) - COMPLETE ✅ |
| 19 | +- [x] Load `~/.claude/projects/*/sessions-index.json` on server startup |
| 20 | +- [x] Store in database with agent_type='claude-code', source='imported' |
| 21 | +- [x] Database schema migrated with 9 new columns (agentType, source, externalId, projectPath, gitBranch, sourcePath, lastSyncedAt, firstPrompt, messageCount) |
| 22 | +- [x] Implement queries.importClaudeCodeConversations() |
| 23 | +- [x] Auto-import on server startup and every 30 seconds |
| 24 | +- [x] Successfully imported 90 Claude Code conversations |
| 25 | +- [x] Test confirms import works correctly (177 total conversations: 87 native + 90 imported) |
| 26 | +- [ ] FUTURE: Display in sidebar with [claude-code] prefix (client-side formatting - nice-to-have) |
| 27 | + |
| 28 | +### Phase 2: OpenCode Integration (PRIORITY 2) |
| 29 | +- [ ] Determine OpenCode conversation storage location (TBD) |
| 30 | +- [ ] Implement OpenCode history loader (parallel to Claude Code) |
| 31 | +- [ ] Merge both sources in database |
| 32 | +- [ ] Display merged history in sidebar |
| 33 | +- [ ] Support agent type filtering |
| 34 | + |
| 35 | +### Phase 3: Sync & Consistency (PRIORITY 3) |
| 36 | +- [ ] File watchers for history changes |
| 37 | +- [ ] Auto-import new conversations from CLI |
| 38 | +- [ ] Keep GUI database in sync with filesystem |
| 39 | +- [ ] Handle conflicts (same conversation in both systems) |
| 40 | +- [ ] Implement read-only mode for imported conversations |
| 41 | + |
| 42 | +### Database Schema Changes |
| 43 | +```sql |
| 44 | +ALTER TABLE conversations ADD COLUMN agent_type TEXT DEFAULT 'claude-code'; |
| 45 | +ALTER TABLE conversations ADD COLUMN source TEXT DEFAULT 'gui'; -- 'gui', 'imported' |
| 46 | +ALTER TABLE conversations ADD COLUMN source_path TEXT; -- path to original file |
| 47 | +ALTER TABLE conversations ADD COLUMN last_synced_at INTEGER; -- Unix timestamp |
| 48 | +``` |
| 49 | + |
| 50 | +**Status:** Implementation begins after this plan is approved |
| 51 | +**Blocking:** Testing until Phase 1 complete |
| 52 | + |
| 53 | +--- |
| 54 | + |
| 55 | +## TESTING PROGRESS SUMMARY |
| 56 | + |
| 57 | +### Completed Work |
| 58 | +✅ **Phase 1: Claude Code Conversation Import - COMPLETE** |
| 59 | +- Database schema migrated successfully (9 new columns added) |
| 60 | +- 90 Claude Code conversations imported from ~/.claude/projects/*/sessions-index.json |
| 61 | +- Auto-import running every 30 seconds on server startup |
| 62 | +- Verified: 177 total conversations (87 native + 90 imported) |
| 63 | +- Verified: Imported conversations have correct metadata (source, agentType, externalId) |
| 64 | + |
| 65 | +✅ **Server & Connectivity** |
| 66 | +- Server running on http://localhost:3000/gm/ |
| 67 | +- UI fully loads with 177 conversations displayed |
| 68 | +- WebSocket sync connection active and working |
| 69 | +- Status indicator shows "Connected" |
| 70 | +- Page title shows conversation count (177) |
| 71 | + |
| 72 | +### Known Issues Found During Testing |
| 73 | +⚠️ **Issue #1: Timeout on Conversation Selection** |
| 74 | +- Clicking conversations causes code execution timeout |
| 75 | +- Suggests potential performance issue or UI hang |
| 76 | +- Needs investigation into click handlers |
| 77 | + |
| 78 | +⚠️ **Issue #2: Message Input Field** |
| 79 | +- Message input found but submission behavior unclear |
| 80 | +- May not be properly wired to agent communication |
| 81 | +- Needs testing with agent actually selected |
| 82 | + |
| 83 | +### Testing Categories (12 Total) |
| 84 | + |
| 85 | +### Category 1: Server Startup & Connection (BLOCKED) |
| 86 | +- [ ] Server starts without errors |
| 87 | +- [ ] HTTP server listens on PORT (3000 or custom) |
| 88 | +- [ ] Static files served correctly |
| 89 | +- [ ] WebSocket endpoint available at /ws |
| 90 | +- [ ] Initial page load completes without JS errors |
| 91 | +- [ ] WebSocket connection establishes on page load |
| 92 | +- [ ] Receives sync_connected message from server |
| 93 | +- [ ] Console shows no errors or warnings |
| 94 | +- **Blocker:** Server cannot start without claude-code-acp |
| 95 | + |
| 96 | +### Category 2: Real-Time Streaming with Persistence |
| 97 | +- [ ] Send test message from UI |
| 98 | +- [ ] Message appears in real-time before database confirmation |
| 99 | +- [ ] StreamHandler.persistAndBroadcast executes atomically |
| 100 | +- [ ] stream_updates table receives entry with correct sequence |
| 101 | +- [ ] WebSocket broadcasts update before database write completes (write-before-broadcast) |
| 102 | +- [ ] Sequence number increments correctly (no gaps) |
| 103 | +- [ ] Update contains correct: sessionId, conversationId, updateType, timestamp |
| 104 | +- [ ] Multiple rapid messages maintain order and sequence |
| 105 | +- [ ] Very large messages (10MB+) handle gracefully |
| 106 | +- [ ] Empty messages are rejected or handled safely |
| 107 | + |
| 108 | +### Category 3: HTML Rendering Without Text Mixing |
| 109 | +- [ ] Agent response renders HTML blocks only (no plain text below) |
| 110 | +- [ ] HTML code blocks extracted correctly: /\`\`\`html\n([\s\S]*?)\n\`\`\`/ |
| 111 | +- [ ] Plain text fallback NEVER triggers |
| 112 | +- [ ] RippleUI components render visually correct |
| 113 | +- [ ] Inline HTML display shows no artifacts or escape sequences |
| 114 | +- [ ] HTML content sanitized (no XSS vectors) |
| 115 | +- [ ] SVG content within HTML renders correctly |
| 116 | +- [ ] Nested HTML structures parse correctly |
| 117 | +- [ ] HTML with special characters escapes properly |
| 118 | +- [ ] Large HTML blocks (100KB+) render without lag |
| 119 | + |
| 120 | +### Category 4: Theme Compliance (Light/Dark) |
| 121 | +- [ ] Page detects system dark/light preference on load |
| 122 | +- [ ] Toggle between light/dark mode works |
| 123 | +- [ ] Theme preference persists across page reloads |
| 124 | +- [ ] RippleUI generated content uses Tailwind classes: text-gray-700 (light), text-gray-300 (dark) |
| 125 | +- [ ] Background colors don't clash with theme: light bg-white/bg-gray-50, dark bg-gray-900/bg-gray-800 |
| 126 | +- [ ] No hardcoded color hex codes in generated HTML |
| 127 | +- [ ] Progress bars use theme-aware colors |
| 128 | +- [ ] Cards/sections adapt to theme automatically |
| 129 | +- [ ] Form inputs styled appropriately for theme |
| 130 | +- [ ] Text contrast ratios meet accessibility standards (WCAG AA) |
| 131 | + |
| 132 | +### Category 5: Advanced RippleUI Components |
| 133 | +- [ ] **Progress Bars:** Display correctly, update smoothly, show percentage text |
| 134 | +- [ ] **Grid Layouts:** Cards arrange responsively, wrap on mobile |
| 135 | +- [ ] **Collapsible Sections:** Toggle state persists visually, smooth animations |
| 136 | +- [ ] **Two-Column Layouts:** Content splits evenly, responsive on narrow screens |
| 137 | +- [ ] **Badge/Pill Labels:** Display with correct styling, multiple variants work |
| 138 | +- [ ] **Timeline Visualizations:** Vertical/horizontal orientation, connection lines render |
| 139 | +- [ ] **Icon + Text Combinations:** Icons align correctly, text wraps properly |
| 140 | +- [ ] **Buttons:** Various states (default, hover, active, disabled) visible |
| 141 | +- [ ] **Code Blocks:** Syntax highlighting works, scrollable for long code |
| 142 | +- [ ] **Tables:** Content aligns, scrollable on mobile, alternating row colors |
| 143 | + |
| 144 | +### Category 6: Interactive Forms & Input Validation |
| 145 | +- [ ] **Text Input:** Accepts input, displays typed text |
| 146 | +- [ ] **Email Input:** Validates email format (basic HTML5 validation) |
| 147 | +- [ ] **Textarea:** Accepts multi-line input, text wraps correctly |
| 148 | +- [ ] **Select Dropdown:** Opens/closes, options selectable, shows selected value |
| 149 | +- [ ] **Checkboxes:** Toggle on/off, multiple selections work, state persists visually |
| 150 | +- [ ] **Radio Buttons:** Mutually exclusive selection, only one active at time |
| 151 | +- [ ] **Form Submit:** Click triggers handler, data captured correctly |
| 152 | +- [ ] **Form Data Capture:** Submitted data includes all field values |
| 153 | +- [ ] **Validation Messages:** Error messages display when validation fails |
| 154 | +- [ ] **Form Reset:** Reset button clears all fields, restores defaults |
| 155 | + |
| 156 | +### Category 7: Database Persistence & Recovery |
| 157 | +- [ ] Session created in database on new connection |
| 158 | +- [ ] Conversation created with correct sessionId |
| 159 | +- [ ] Each stream_update persists with atomic sequence |
| 160 | +- [ ] Messages survive database restart (data integrity) |
| 161 | +- [ ] Refresh page shows all previous messages in order |
| 162 | +- [ ] State checkpoints created for recovery (latest 5 versions) |
| 163 | +- [ ] Gap detection identifies missing sequence numbers |
| 164 | +- [ ] Full state recovery request fetches missing data |
| 165 | +- [ ] Duplicate updates prevented (idempotency) |
| 166 | +- [ ] Orphaned data cleaned up (no dangling references) |
| 167 | + |
| 168 | +### Category 8: State Consistency & Validation |
| 169 | +- [ ] StateValidator.validateSession identifies state errors |
| 170 | +- [ ] Checksum validation detects corrupted updates |
| 171 | +- [ ] Sequence gaps trigger recovery request |
| 172 | +- [ ] Session state returns complete picture for recovery |
| 173 | +- [ ] Concurrent updates don't cause race conditions |
| 174 | +- [ ] Update count matches database row count |
| 175 | +- [ ] Block count accurate for all content types |
| 176 | +- [ ] No duplicate messages in UI |
| 177 | +- [ ] Message order matches sequence numbers exactly |
| 178 | +- [ ] State validator runs async (doesn't block broadcast) |
| 179 | + |
| 180 | +### Category 9: Reconnection & Recovery |
| 181 | +- [ ] WebSocket disconnect detected within 5 seconds |
| 182 | +- [ ] Exponential backoff attempts: 1s, 2s, 4s, 8s, 16s, 32s, 60s |
| 183 | +- [ ] Max 10 reconnect attempts before giving up |
| 184 | +- [ ] Reconnect succeeds after network recovery |
| 185 | +- [ ] Missed messages fetched on reconnect via gap detection |
| 186 | +- [ ] State checkpoint provides recovery baseline |
| 187 | +- [ ] Fast reconnect (< 100ms) shows no duplicate messages |
| 188 | +- [ ] Slow network (high latency) handles gracefully |
| 189 | +- [ ] Multiple rapid disconnect/connect cycles work |
| 190 | +- [ ] Persistent connection maintained for 1+ hours |
| 191 | + |
| 192 | +### Category 10: Error Handling & Edge Cases |
| 193 | +- [ ] Malformed JSON in stream updates rejected |
| 194 | +- [ ] Invalid session ID returns 404 |
| 195 | +- [ ] Missing conversationId handled gracefully |
| 196 | +- [ ] Null/undefined content doesn't crash |
| 197 | +- [ ] Very large payloads (100MB) handled or rejected |
| 198 | +- [ ] Rapid fire 1000 messages/second buffered correctly |
| 199 | +- [ ] Single character message processes correctly |
| 200 | +- [ ] Message with only whitespace processed |
| 201 | +- [ ] HTML with syntax errors parsed safely |
| 202 | +- [ ] Database locked by other process doesn't block |
| 203 | + |
| 204 | +### Category 11: Performance & Latency |
| 205 | +- [ ] First message appears in UI within 50ms of broadcast |
| 206 | +- [ ] Database write + broadcast takes < 100ms total |
| 207 | +- [ ] Page load from blank to interactive < 2 seconds |
| 208 | +- [ ] 1000 message history loads in < 1 second |
| 209 | +- [ ] WebSocket messages deliver within 50ms (local network) |
| 210 | +- [ ] Memory usage stays under 500MB (after 10k messages) |
| 211 | +- [ ] CPU usage < 5% idle (no busy loops) |
| 212 | +- [ ] Smooth scrolling with 1000+ messages rendered |
| 213 | +- [ ] No jank when switching themes |
| 214 | +- [ ] Form submission responsive (< 200ms) |
| 215 | + |
| 216 | +### Category 12: Multi-Agent & Configuration |
| 217 | +- [ ] Agent type selection works (claude-code vs opencode) |
| 218 | +- [ ] CLI config loaded from ~/.claude/config.json |
| 219 | +- [ ] OpenCode config loaded from ~/.opencode/config.json |
| 220 | +- [ ] Environment variables passed through (HOME, PATH, etc) |
| 221 | +- [ ] OAuth tokens handled correctly |
| 222 | +- [ ] Model preferences applied from config |
| 223 | +- [ ] Multiple agents can run in same session |
| 224 | +- [ ] Agent switching preserves conversation history |
| 225 | +- [ ] Session isolation prevents cross-contamination |
| 226 | +- [ ] Configuration changes apply without restart |
| 227 | + |
| 228 | +--- |
| 229 | + |
| 230 | +## Issue Categories & Fixes |
| 231 | + |
| 232 | +### Issue Tracking Template |
| 233 | +``` |
| 234 | +[ISSUE-###] Category: <1-12> |
| 235 | +Severity: Critical | High | Medium | Low |
| 236 | +Status: New | In Progress | Fixed | Verified |
| 237 | +Description: <what fails> |
| 238 | +Steps to Reproduce: <how to trigger> |
| 239 | +Expected: <what should happen> |
| 240 | +Actual: <what happens instead> |
| 241 | +Fix Applied: <how fixed> |
| 242 | +Verification: <how verified> |
| 243 | +``` |
| 244 | + |
| 245 | +### Known Issues |
| 246 | +1. **[BLOCKER] Missing claude-code-acp package** |
| 247 | + - Status: Blocking all testing |
| 248 | + - Needs: Clarify package source or replace with alternative |
| 249 | + - Impact: Server cannot start |
| 250 | + |
| 251 | +--- |
| 252 | + |
| 253 | +## Testing Execution Flow |
| 254 | + |
| 255 | +1. **Resolve Dependency Blocker** |
| 256 | + - Clarify `claude-code-acp` source |
| 257 | + - Install package or replace import |
| 258 | + - Verify server starts successfully |
| 259 | + |
| 260 | +2. **Category 1: Server Startup** (Foundation) |
| 261 | + - If passes: Continue |
| 262 | + - If fails: Fix blocker, retry |
| 263 | + |
| 264 | +3. **Categories 2-12: Feature Testing** (Parallel where possible) |
| 265 | + - Execute in dependency order: |
| 266 | + - Streaming (2) → Persistence (7) |
| 267 | + - HTML Rendering (3) → Theme (4) → Components (5) |
| 268 | + - Forms (6) → Validation (6) |
| 269 | + - State (8) → Reconnection (9) |
| 270 | + - Error Handling (10) |
| 271 | + - Performance (11) |
| 272 | + - Multi-Agent (12) |
| 273 | + |
| 274 | +4. **Issue Remediation** |
| 275 | + - Fix each issue as discovered |
| 276 | + - Re-run affected test categories |
| 277 | + - Document root cause |
| 278 | + |
| 279 | +5. **Final Verification** |
| 280 | + - Full end-to-end workflow test |
| 281 | + - All 12 categories passing |
| 282 | + - No blockers remaining |
| 283 | + - Performance benchmarks met |
| 284 | + |
| 285 | +--- |
| 286 | + |
| 287 | +## Success Criteria (All Must Pass) |
| 288 | +- [ ] Server starts and stays running |
| 289 | +- [ ] All 12 test categories pass |
| 290 | +- [ ] No critical or high severity issues |
| 291 | +- [ ] Response time < 100ms average |
| 292 | +- [ ] Zero data loss on disconnect/reconnect |
| 293 | +- [ ] HTML rendering clean (no text mixing) |
| 294 | +- [ ] Theme compliance verified |
| 295 | +- [ ] Forms fully functional |
| 296 | +- [ ] Database persistence verified |
| 297 | +- [ ] Production ready |
| 298 | + |
| 299 | +--- |
| 300 | + |
| 301 | +**Last Updated:** 2026-02-03 |
| 302 | +**Testing Status:** BLOCKED - Awaiting Dependency Resolution |
| 303 | +**Next Step:** Resolve `claude-code-acp` package availability |
0 commit comments