Skip to content

Feat/linux pulseaudio audio#152

Open
xingjianll wants to merge 8 commits into
mainfrom
feat/linux-pulseaudio-audio
Open

Feat/linux pulseaudio audio#152
xingjianll wants to merge 8 commits into
mainfrom
feat/linux-pulseaudio-audio

Conversation

@xingjianll

Copy link
Copy Markdown
Contributor

No description provided.

xingjianll and others added 8 commits April 6, 2026 00:52
On Linux, use parec/paplay subprocess to access PulseAudio/PipeWire
virtual devices (e.g. cable_a.monitor, cable_b) that sounddevice
can't see through ALSA. Device dropdown lists PulseAudio sources/sinks
via pactl. Falls back to sounddevice on macOS/Windows.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Single component replaces separate VAD + ASR pipeline. Streams audio
to Deepgram's WebSocket API (nova-3), returns final transcriptions
with speaker labels (e.g. "[Speaker 0] hello").

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Removes messages output, adds token/text/tool_calls/eos outputs (same as LLM)
- Calls OpenAI Responses API directly with streaming
- Recursive _call_llm for tool call → tool result → LLM loop
- Accepts ToolDef inputs for function calling
- Drains speech/feedback/vision/objects/pose/memory each iteration
- Only calls LLM when new user input arrives

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tchpad tool

- AgentLoop calls OpenAI Responses API directly (no external LLM component)
- Accepts VideoFrame input, encodes as base64 JPEG for vision
- Built-in diary tool for tracking mental state (handled internally)
- ScratchpadTool component with read/update tools
- Continuous loop with no speech gating — agent is always active
- No recursion — flat loop with fresh transient context each iteration
- System prompt emphasizes embodiment and vision grounding

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use ToolResult frame instead of TextFrame for tool_result input
- Use result.call_id instead of local tc dict for correct pairing
- Add head look heading alongside body heading in pose context
- Add debug prints for pose data

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant