Skip to content

fix(desktop): stabilize Soniqo realtime transcription#5206

Open
ComputelessComputer wants to merge 13 commits into
mainfrom
soniqo
Open

fix(desktop): stabilize Soniqo realtime transcription#5206
ComputelessComputer wants to merge 13 commits into
mainfrom
soniqo

Conversation

@ComputelessComputer
Copy link
Copy Markdown
Collaborator

@ComputelessComputer ComputelessComputer commented May 14, 2026

Consolidates the Soniqo realtime transcription work into one branch.

  • Add local Soniqo speech-swift transcription support and model routing.
  • Validate Soniqo live language support and loopback model URLs.
  • Normalize Soniqo realtime partials so repeated cumulative text is replaced instead of appended.
  • Keep Soniqo partials ephemeral on flush while preserving model-final words.
  • Finalize native Soniqo live sessions before shutdown and clean up failed session state.

Supersedes #5190, #5201, and #5204.


Note

High Risk
Touches core transcription/session supervision code and introduces a new Swift-linked transcribe-soniqo backend with live/batch routing, so regressions could break recording/transcription flows or platform-specific builds (macOS Apple Silicon) if edge cases aren’t covered.

Overview
Stabilizes on-device STT by adding a new transcribe-soniqo crate (Swift-linked on macOS) and routing Soniqo models through both live streaming (listener-core) and batch (listener2-core) transcription paths, including loopback URL detection and language support validation.

Improves realtime partial handling for Soniqo by normalizing cumulative/overlapping updates in live_transcript so repeated text is replaced instead of appended, and updates session supervision to compute an effective transcription mode and stop sessions (instead of silently falling back) when a required Soniqo live listener fails.

Extends the desktop UI to surface Soniqo models, model sizes, runtime badges/icons, and realtime vs batch labeling, and adds configurable audio retention (audio_retention) with scheduled cleanup plus settings migration from legacy booleans.

Removes several CLI/mobile-specific GitHub workflows/actions and adjusts desktop_ci exclusions, and centralizes Parakeet TDT v3 language metadata in language for reuse across providers.

Reviewed by Cursor Bugbot for commit 687c3e7. Bugbot is set up for automated code reviews on this repo. Configure here.

@netlify
Copy link
Copy Markdown

netlify Bot commented May 14, 2026

Deploy Preview for old-char ready!

Name Link
🔨 Latest commit 687c3e7
🔍 Latest deploy log https://app.netlify.com/projects/old-char/deploys/6a05f608b6025c0008172f0f
😎 Deploy Preview https://deploy-preview-5206--old-char.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

Comment thread crates/listener-core/src/live_transcript.rs
Comment thread crates/listener-core/src/live_transcript.rs Outdated
Comment thread crates/listener-core/src/actors/listener/adapters.rs
Comment thread crates/listener-core/src/live_transcript.rs
Comment thread crates/cactus-model/src/lib.rs Outdated
Comment thread crates/listener-core/src/live_transcript.rs
Add Soniqo local speech-swift models for realtime and batch transcription, derive local transcription mode from the selected model, and keep audio retention migrated to expiration-based cleanup.
Remove stale CLI PR workflows now that the cli package is gone, and add fallback content-collection exports used by web typecheck before generated content exists.
Drop stale desktop-bundled CLI artifacts and obsolete CLI/mobile workflows so the desktop app no longer carries the CLI bundle path and PR checks skip removed surfaces.
Recognize local Soniqo models from loopback base URLs in capture and listener mode selection, with tests covering the PR review case.
Add Wikimedia-sourced SVG assets for Qwen, NVIDIA, Meta, and OpenAI local STT model badges.
Route live language checks through shared listener validation, make Hyprnote Soniqo language support model-aware, and drop the unused on-device mode setting.
Move Parakeet TDT v3 language support into hypr-language and reuse it from Soniqo, Cactus, and Hyprnote checks.
Stop local Soniqo live sessions on listener failure and clear stopped supervisors before accepting new capture starts.
Retimes cumulative Soniqo realtime partials so live transcript rows replace previous text instead of appending duplicates.
Drop leftover Soniqo partial snapshots instead of persisting them, and finalize native live sessions before shutdown.
Call the native Soniqo stream finalizer for each source before stopping so model-final text is emitted separately from live partial snapshots.
Treat rewritten Soniqo live partial snapshots as replacements while preserving sliding-window overlap behavior.
Refresh Soniqo timing after internal repeat collapse and represent unavailable Cactus model sizes as unknown through local STT metadata.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 687c3e7. Configure here.

mut params: SessionParams,
state: &mut RootState,
) -> Result<(), StartSessionError> {
params.transcription_mode = params.effective_transcription_mode();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lifecycle event loses original requested transcription mode

Medium Severity

params.transcription_mode is overwritten by effective_transcription_mode() before the SessionLifecycleEvent::Active event is emitted. The event then uses the mutated value for both requested_transcription_mode and current_transcription_mode, making them always identical. The requested_transcription_mode field is meant to capture what the caller originally asked for, but it instead reflects the server-side override.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 687c3e7. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant