Skip to content

feat: voice dashboard, barge-in detection, kernel integration#1

Merged
chazmaniandinkle merged 3 commits intomainfrom
feat/dashboard-barge-in
Apr 13, 2026
Merged

feat: voice dashboard, barge-in detection, kernel integration#1
chazmaniandinkle merged 3 commits intomainfrom
feat/dashboard-barge-in

Conversation

@chazmaniandinkle
Copy link
Copy Markdown
Contributor

Summary

  • Voice dashboard at /dashboard — browser-based voice/text chat via WebSocket, Silero VAD v5, three inference providers (local Gemma 4, Ollama, CogOS kernel)
  • Barge-in detection — filesystem-based SuperWhisper recording detection, speak() returns "held" during user speech, interrupt context tracking
  • Kernel integration — agent loop pulls CogOS context per turn, exchanges logged to bus for observation by higher-level agents
  • Reliability fixes — non-blocking speak(), Kokoro pre-warm, WebSocket cleanup, graceful shutdown, standardized /health and /capabilities

New files

  • agent_loop.py — Gemma 4 tool-calling agent loop with kernel context injection
  • channels.py — BrowserChannel WebSocket adapter for dashboard
  • providers.py — MlxProvider, OllamaProvider, CogOSProvider + auto-detect
  • dashboard/ — HTML/JS/VAD assets for browser voice chat
  • integrations/bargein-producer.py — SuperWhisper barge-in signal producer

Test plan

  • python server.py --dashboard --port 7860 then open http://localhost:7860/dashboard
  • Text chat works (type message, get AI response + TTS)
  • Voice chat works (allow mic, speak, see transcript, hear response)
  • Barge-in: run bargein-producer.py, start SuperWhisper during TTS -> playback interrupts
  • curl localhost:7860/health returns enriched status
  • curl localhost:7860/capabilities returns voice lists

Generated with Claude Code

chazmaniandinkle and others added 3 commits April 13, 2026 17:36
Dashboard:
- Browser-based voice/text chat at /dashboard via WebSocket (/ws/chat)
- Silero VAD v5 in-browser, PCM audio streaming, auto-reconnect
- Three inference providers: MlxProvider (local Gemma), OllamaProvider, CogOSProvider
- Agent loop with tool-calling (speak/send_text) — Gemma 4 E4B works natively

Barge-in:
- SuperWhisper recording detection via filesystem watching (bargein-producer.py)
- speak() returns "held" when user is recording — no zombie queued jobs
- Interrupt context written back to signal file (spoken_pct, delivered_text)
- 150ms poll, pure stdlib Python, zero external dependencies

Kernel integration:
- Agent loop pulls CogOS kernel context per turn (identity, state, barge-in history)
- Exchanges logged to CogOS bus for observation by higher-level agents
- CogOSProvider routes through kernel /v1/chat/completions

Reliability:
- Non-blocking speak() in agent loop (fire-and-forget via bus.act)
- Kokoro pre-warm on server startup (eliminates 60s cold start)
- WebSocket cleanup: cancel pending TTS jobs on disconnect
- Graceful /shutdown endpoint with drain + SIGINT
- Standardized /health with uptime, engine status, queue state
- /capabilities endpoint with dynamic voice lists

New files: agent_loop.py, channels.py, providers.py, dashboard/,
           integrations/bargein-producer.py
Modified: http_api.py, server.py, output_queue.py, requirements.txt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove unused imports (typing.Any, WebSocketDisconnect) and unused
variable assignments (full, loop). Update /health smoke test assertions
to match the new structured response format (engines dict, modalities
dict, status can be 'degraded' when no engines loaded).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@chazmaniandinkle chazmaniandinkle merged commit 0277629 into main Apr 13, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant