Talk to a coding agent on your phone. Push to talk, watch the transcript, hear
the reply spoken back. The agent is the claude CLI running in Termux on the
same device — so it has your repos and tools — and the app is a thin voice
front-end that reaches it over localhost. Nothing leaves the phone.
flowchart TB
subgraph phone["📱 one phone — nothing leaves the device"]
direction TB
subgraph app["Android app · this repo"]
direction TB
mic["🎙️ hold-to-talk<br/>16 kHz mono wav"]
tx["📝 transcript"]
spk["🔊 text-to-speech"]
end
subgraph bridge["Termux · bridge/ · Go · 127.0.0.1:8765"]
direction TB
stt["whisper.cpp<br/>speech → text"]
agent["claude -p --continue<br/>your repos + tools"]
end
end
mic -- "POST /stt" --> stt
stt -- "text" --> tx
tx -- "POST /chat" --> agent
agent -- "reply (streamed)" --> spk
classDef appbox fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b;
classDef bridgebox fill:#ecfdf5,stroke:#059669,color:#064e3b;
class mic,tx,spk appbox;
class stt,agent bridgebox;
app/— Android app (Kotlin, minSdk 23): hold-to-talk records 16 kHz mono PCM, POSTs to the bridge, shows the transcript, speaks the reply.bridge/— Termux-side localhost server, a single static Go binary: N agents (one per dir),/sttvia whisper.cpp,/chatvia theclaudeCLI..drone.jsonnet— CI: builds the APK and publishes to GitHub releases on tag.
- In Termux, build whisper.cpp and put
claudeonPATH, then run the bridge from your repo:cd <repo> && ./claude-voice-bridge-arm64(serves127.0.0.1:8765). - Install the app APK (CI artifact or local build), open it, confirm the bridge URL, grant the mic permission, hold to talk.
- Bridge:
./bridge/build.sh— a staticGOOS=androidbinary at the repo root. - App: needs the Android SDK (
compileSdk 34); CI builds it (see.drone.jsonnet).
| Method | Path | Body | Returns |
|---|---|---|---|
| GET | /health |
— | ok |
| GET | /agents |
— | [{id,name,dir,branch,dirty}] |
| POST | /agents |
{"dir":"~/repo"} |
current agent list |
| DELETE | /agents/<id> |
— | current agent list |
| GET | /ls?dir= |
— | {dir,parent,dirs} |
| POST | /stt |
WAV bytes | transcript (whisper.cpp) |
| POST | /chat |
{"text","agent":id} |
agent reply, streamed (claude -p) |
| GET | /voices |
— | Piper voices ([] if off) |
| POST | /tts |
{"text","voice"} |
WAV audio (Piper) or 501 if off |
One agent per directory, each with its own claude --continue conversation.
VOICE_PORT/VOICE_HOST— default8765/127.0.0.1VOICE_PERM— claude permission mode (defaultbypassPermissions)VOICE_TIMEOUT— seconds before a stuck turn aborts (default1800)VOICE_WORKDIR— initial agent dir (default: home)WHISPER_BIN/WHISPER_MODEL, and thePIPER_*paths
Install with ./install-piper.sh (needs glibc-runner). When piper is present at
~/piper, the bridge serves /tts and /voices; otherwise the app falls back to
Android TTS. Add voices by dropping <name>.onnx[.json] into ~/piper-voices.

