|
| 1 | +# State Machine Implementation - Checklist & Reference |
| 2 | + |
| 3 | +## ✅ Completed Features |
| 4 | + |
| 5 | +### Core State Machine |
| 6 | +- [x] StateManager class with 9 defined states |
| 7 | +- [x] State transition validation |
| 8 | +- [x] Invalid transition guards (throw errors) |
| 9 | +- [x] State history tracking with timestamps |
| 10 | +- [x] Reason/metadata for each transition |
| 11 | +- [x] Automatic 120-second timeout watchdog |
| 12 | +- [x] Promise-based completion API |
| 13 | +- [x] Terminal state detection |
| 14 | +- [x] State history retrieval |
| 15 | + |
| 16 | +### Session Management |
| 17 | +- [x] SessionStateStore global registry |
| 18 | +- [x] Session creation with ID tracking |
| 19 | +- [x] Session retrieval and validation |
| 20 | +- [x] Active session filtering |
| 21 | +- [x] Terminal session tracking |
| 22 | +- [x] Automatic cleanup (>1 hour) |
| 23 | +- [x] Diagnostic aggregation |
| 24 | + |
| 25 | +### Server Integration |
| 26 | +- [x] Import StateManager in server.js |
| 27 | +- [x] Create global SessionStateStore |
| 28 | +- [x] Rewrite processMessage() to use state machine |
| 29 | +- [x] Add state transitions for each step |
| 30 | +- [x] Implement error handling with state tracking |
| 31 | +- [x] Add getACP() timeout protection (60s) |
| 32 | +- [x] Create /api/diagnostics/sessions endpoint |
| 33 | +- [x] Add comprehensive logging |
| 34 | + |
| 35 | +### Database Fixes |
| 36 | +- [x] Fix message content type handling (stringify objects) |
| 37 | +- [x] Fix session response/error serialization |
| 38 | +- [x] Fix event data JSON handling |
| 39 | +- [x] Fix idempotencyKeys type conversion |
| 40 | + |
| 41 | +### Documentation |
| 42 | +- [x] StateManager code comments |
| 43 | +- [x] Architecture diagrams |
| 44 | +- [x] Usage examples |
| 45 | +- [x] Monitoring guide |
| 46 | +- [x] Diagnostics explanation |
| 47 | +- [x] Issue diagnosis (ACP hang) |
| 48 | +- [x] Next steps guide |
| 49 | + |
| 50 | +--- |
| 51 | + |
| 52 | +## 📊 State Machine States |
| 53 | + |
| 54 | +``` |
| 55 | +PENDING |
| 56 | + ↓ |
| 57 | +ACQUIRING_ACP ← Connect to Claude Code ACP |
| 58 | + ↓ |
| 59 | +ACP_ACQUIRED ← Connection established |
| 60 | + ↓ |
| 61 | +SENDING_PROMPT ← Sending prompt to ACP |
| 62 | + ↓ |
| 63 | +PROCESSING ← Processing response |
| 64 | + ↓ |
| 65 | +COMPLETED ← ✅ Success |
| 66 | +
|
| 67 | +ERROR ← ❌ Any step failed (at any point) |
| 68 | +TIMEOUT ← ❌ Exceeded 120s (automatic) |
| 69 | +CANCELLED ← Stopped by user |
| 70 | +``` |
| 71 | + |
| 72 | +--- |
| 73 | + |
| 74 | +## 🔍 Diagnostics Endpoint |
| 75 | + |
| 76 | +**Endpoint**: `GET /api/diagnostics/sessions` |
| 77 | + |
| 78 | +**Response Format**: |
| 79 | +```javascript |
| 80 | +{ |
| 81 | + timestamp: ISO 8601 string, |
| 82 | + activeSessions: number, |
| 83 | + terminalSessions: number, |
| 84 | + totalSessions: number, |
| 85 | + active: [ |
| 86 | + { |
| 87 | + sessionId: string, |
| 88 | + state: string, |
| 89 | + uptime: milliseconds |
| 90 | + } |
| 91 | + ], |
| 92 | + recentTerminal: [ |
| 93 | + { |
| 94 | + sessionId: string, |
| 95 | + conversationId: string, |
| 96 | + messageId: string, |
| 97 | + state: 'completed'|'error'|'timeout'|'cancelled', |
| 98 | + duration: '1234ms', |
| 99 | + historyLength: number, |
| 100 | + history: ['0ms: pending (initialized)', ...], |
| 101 | + data: { |
| 102 | + fullTextLength: number, |
| 103 | + blocksCount: number, |
| 104 | + error: null | string, |
| 105 | + hasStackTrace: boolean |
| 106 | + } |
| 107 | + } |
| 108 | + ] |
| 109 | +} |
| 110 | +``` |
| 111 | + |
| 112 | +--- |
| 113 | + |
| 114 | +## 🚀 Usage Examples |
| 115 | + |
| 116 | +### Create a Session |
| 117 | +```javascript |
| 118 | +const stateManager = sessionStateStore.create( |
| 119 | + sessionId, |
| 120 | + conversationId, |
| 121 | + messageId, |
| 122 | + 120000 // timeout in ms |
| 123 | +); |
| 124 | +``` |
| 125 | + |
| 126 | +### Transition State |
| 127 | +```javascript |
| 128 | +stateManager.transition(StateManager.STATES.ACQUIRING_ACP, { |
| 129 | + reason: 'Starting ACP connection', |
| 130 | + data: {} |
| 131 | +}); |
| 132 | +``` |
| 133 | + |
| 134 | +### Check Current State |
| 135 | +```javascript |
| 136 | +const state = stateManager.getState(); |
| 137 | +// 'pending' | 'acquiring_acp' | 'acp_acquired' | ... |
| 138 | +``` |
| 139 | + |
| 140 | +### Get Full History |
| 141 | +```javascript |
| 142 | +const history = stateManager.getHistory(); |
| 143 | +// Array of {state, timestamp, reason, details} |
| 144 | +``` |
| 145 | + |
| 146 | +### Wait for Completion |
| 147 | +```javascript |
| 148 | +try { |
| 149 | + const result = await stateManager.waitForCompletion(); |
| 150 | + console.log(`Success in ${result.data.duration}`); |
| 151 | +} catch (err) { |
| 152 | + console.error(`Failed: ${err.message}`); |
| 153 | +} |
| 154 | +``` |
| 155 | + |
| 156 | +### Get Diagnostics |
| 157 | +```javascript |
| 158 | +const diag = sessionStateStore.getDiagnostics(); |
| 159 | +console.log(`Active: ${diag.activeSessions}`); |
| 160 | +console.log(`Terminal: ${diag.terminalSessions}`); |
| 161 | +``` |
| 162 | + |
| 163 | +--- |
| 164 | + |
| 165 | +## 🛡️ Error Handling |
| 166 | + |
| 167 | +### Invalid Transition |
| 168 | +```javascript |
| 169 | +// This will throw! |
| 170 | +stateManager.transition(StateManager.STATES.COMPLETED, {}); |
| 171 | +// Error: "Invalid state transition: pending → completed. Valid: [acquiring_acp, cancelled]" |
| 172 | +``` |
| 173 | + |
| 174 | +### Session Not Found |
| 175 | +```javascript |
| 176 | +const manager = sessionStateStore.getOrThrow(sessionId); |
| 177 | +// Throws if sessionId doesn't exist |
| 178 | +``` |
| 179 | + |
| 180 | +### Timeout |
| 181 | +```javascript |
| 182 | +// After 120 seconds in any non-terminal state: |
| 183 | +// Automatically transitions to TIMEOUT state |
| 184 | +``` |
| 185 | + |
| 186 | +--- |
| 187 | + |
| 188 | +## 📝 Logging Output |
| 189 | + |
| 190 | +### State Transition Log |
| 191 | +``` |
| 192 | +[StateManager] sess-123 transitioned: pending → acquiring_acp (+1ms) | Starting ACP connection |
| 193 | +[StateManager] sess-123 transitioned: acquiring_acp → acp_acquired (+25ms) | ACP connected |
| 194 | +[StateManager] sess-123 transitioned: acp_acquired → sending_prompt (+0ms) | Sending to ACP |
| 195 | +[StateManager] sess-123 transitioned: sending_prompt → processing (+100ms) | Processing response |
| 196 | +[StateManager] sess-123 transitioned: processing → completed (+2145ms) | Response successfully generated |
| 197 | +``` |
| 198 | + |
| 199 | +### Process Message Log |
| 200 | +``` |
| 201 | +[processMessage] Starting: conversationId=conv-123, sessionId=sess-456 |
| 202 | +[processMessage] Initial state: pending |
| 203 | +[getACP] Step 1: Connecting to claude-code... |
| 204 | +[getACP] Step 2: Connected, initializing... |
| 205 | +[getACP] Step 3: Initialized, creating session... |
| 206 | +[getACP] ✅ ACP connection ready for claude-code in /config |
| 207 | +[processMessage] Sending prompt to ACP (45 chars) |
| 208 | +[processMessage] ACP returned: stopReason=end_turn, fullText=12345 chars |
| 209 | +[processMessage] ✅ Session completed: 2567ms |
| 210 | +``` |
| 211 | + |
| 212 | +--- |
| 213 | + |
| 214 | +## 🔧 Configuration |
| 215 | + |
| 216 | +### Timeouts |
| 217 | +- **Session timeout**: 120 seconds (hardcoded) |
| 218 | +- **ACP timeout**: 60 seconds (hardcoded in getACP) |
| 219 | +- **Session cleanup TTL**: 3600000ms (1 hour) |
| 220 | + |
| 221 | +### Cleanup Schedule |
| 222 | +- Runs every 10 minutes (600000ms) |
| 223 | +- Removes terminal sessions older than 1 hour |
| 224 | + |
| 225 | +### Data Retention |
| 226 | +- Recent terminal sessions: kept in memory indefinitely |
| 227 | +- Cleanup prevents unbounded memory growth |
| 228 | + |
| 229 | +--- |
| 230 | + |
| 231 | +## 🐛 Debugging |
| 232 | + |
| 233 | +### See All Active Sessions |
| 234 | +```bash |
| 235 | +curl http://localhost:9899/gm/api/diagnostics/sessions | grep -A 5 "active" |
| 236 | +``` |
| 237 | + |
| 238 | +### Find Stuck Sessions |
| 239 | +```bash |
| 240 | +curl http://localhost:9899/gm/api/diagnostics/sessions | grep "acquiring_acp" |
| 241 | +``` |
| 242 | + |
| 243 | +### Get Session History |
| 244 | +```bash |
| 245 | +curl http://localhost:9899/gm/api/diagnostics/sessions | grep -A 20 "recentTerminal" |
| 246 | +``` |
| 247 | + |
| 248 | +### Follow State Transitions |
| 249 | +```bash |
| 250 | +tail -f server.log | grep "StateManager" |
| 251 | +``` |
| 252 | + |
| 253 | +### Find Errors |
| 254 | +```bash |
| 255 | +tail -f server.log | grep -E "ERROR|Stack:|❌" |
| 256 | +``` |
| 257 | + |
| 258 | +--- |
| 259 | + |
| 260 | +## 📚 Files Modified |
| 261 | + |
| 262 | +| File | Changes | Lines | |
| 263 | +|------|---------|-------| |
| 264 | +| state-manager.js | NEW | 350 | |
| 265 | +| server.js | Modified | +300, -80 | |
| 266 | +| database.js | Fixed | +40 | |
| 267 | +| DIAGNOSTICS.md | NEW | 80 | |
| 268 | +| STATE_MACHINE_SUMMARY.md | NEW | 220 | |
| 269 | + |
| 270 | +--- |
| 271 | + |
| 272 | +## ✨ Key Improvements |
| 273 | + |
| 274 | +**Before State Machine**: |
| 275 | +- ❌ Fire-and-forget processing |
| 276 | +- ❌ No visibility into failures |
| 277 | +- ❌ Hangs cause no feedback |
| 278 | +- ❌ Hidden race conditions |
| 279 | +- ❌ Impossible to debug |
| 280 | + |
| 281 | +**After State Machine**: |
| 282 | +- ✅ Every session tracked |
| 283 | +- ✅ Complete visibility |
| 284 | +- ✅ Immediate timeout detection |
| 285 | +- ✅ Explicit error handling |
| 286 | +- ✅ Full audit trail |
| 287 | + |
0 commit comments