feat(media): add option to duck media volume instead of pausing#459
feat(media): add option to duck media volume instead of pausing#459akar016012 wants to merge 7 commits into
Conversation
Add a "Lower Volume Instead of Pausing" sub-option to "Pause Media During Transcription". When enabled, FluidVoice lowers the system output volume while you dictate and restores it afterward, instead of fully stopping playback — useful for keeping a video audible but quiet during narration. - Add SystemAudioVolumeController, a CoreAudio wrapper that reads/sets the default output device volume (master element with per-channel fallback). - Extend MediaPlaybackService to either pause or duck based on the setting, tracking which action was taken so it reverts exactly what it applied. On restore it leaves the volume untouched if the user changed it mid-dictation, and falls back to pausing if the volume can't be lowered. - Add duckMediaInsteadOfPausing and duckMediaVolumeLevel settings (level defaults to 20%, clamped 5–100%), including backup/restore support. - Add the nested toggle and a level slider to Settings, shown only when media pausing is enabled. Note: CoreAudio output volume is system-wide, so ducking lowers all output from the default device, not just a single app. Behavior is arm64-only, consistent with the existing pause feature.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0618a4e76d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…backs MediaRemoteAdapter can fire the getTrackInfo callback more than once (the existing resumeOnce one-shot already guards the continuation against this). The duck side effect, however, ran on every callback before the gate. Since ducking is not idempotent — it reads the current output volume as the "original" — a duplicate callback re-ducked the already-lowered volume and overwrote activeSuppression with the ducked value, so the later restore only returned to the ducked level instead of the user's original volume. Route applySuppression() through resumeOnce's one-shot gate so it runs exactly once, for the winning callback, before the continuation resumes. Pause mode was unaffected (pause() is idempotent); this only mattered for ducking.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 32457dc15b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
On output devices with no settable master volume, the controller fell back to per-channel writes while reading volume as the average of the channels. Ducking then captured that average as the "original" and, on restore, wrote it back to every channel — permanently flattening a user's non-centered left/right balance after a dictation session (and it did not self-heal). Replace the scalar get/set API with an OutputVolumeSnapshot that records each element (master or individual channels) and its level: - capture the full per-channel state before ducking, - duck by scaling every channel by the same factor (preserving balance), - restore each channel to its captured value. The snapshot also carries the device id, so restore and the "did the user change it?" check operate on the device that was actually ducked even if the default output device changes mid-dictation.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d421045fb6
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
TranscriptionSoundPlayer's independent-volume mode also writes the system output volume and schedules a fire-and-forget async restore. That made it a second, uncoordinated owner of the same volume the new ducking feature controls, with two race outcomes: - the start cue (played just before asr.start()) sets system volume and the duck then captures that transient level as the "original"; the cue's late restore undoes the duck mid-session; - the stop cue (played while ducked) saves the ducked level and, if its restore fires after resumeIfWePaused, overwrites the final restore and leaves the Mac stuck at the ducked level. When ducking is the active media behavior (pause + lower-volume both on), it now owns the system volume: the cue plays at its own AVAudioPlayer volume and no longer hijacks/restores system volume, eliminating the race. Independent cue volume is unaffected when ducking is off.
…ked device Two edge-case fixes found in self-review of the ducking path: - captureOutputVolume() preferred the master element whenever it was *readable*, but apply() requires it to be *settable*. On a device with a read-only master and settable per-channel volumes, ducking would capture the master, fail to apply, and needlessly fall back to pausing. Capture now records only settable elements (master if settable, otherwise the settable channels), keeping capture and restore symmetric. - Replaced the post-duck re-capture and the restore-time change check, which re-resolved the *default* output device, with reread(snapshot:) that reads the snapshot's own device/elements. This keeps the quantization read-back accurate and is unaffected if the default output device changes mid-session.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ef88004076
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 16d4ebe173
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
The ducking-precedence guard was keyed on settings alone, so it also forced independent cue volume off for Settings previews — which never run during a dictation session and have no ducking conflict to avoid. Scope the precedence to the actual session start/stop cues via an enforceDuckingPrecedence flag; previews pass false and always honor the independent-volume setting. Session cues still defer to ducking, since the start cue fires before playback state is known.
stop() sets isRunning=false before its final transcription pass and only reverts media afterwards, so a new dictation can start during that window. With a single shared activeSuppression, the second session would capture the already-ducked volume as its "original," the first session's revert would then clear the newer snapshot, and the Mac could be left stuck at the ducked level. pauseIfPlaying() now bails out (returning false, "no new action") when a suppression is already active, so the in-flight revert from the prior session remains the sole owner of the system volume and restores the true original.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9169d9f175
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@altic-dev Could you take a look at this PR and approve if you want to go ahead with this feature? Please release a new version so that people can start testing or maybe release a pre-release version so that people can start testing and let me know if there are any issues and I'll work on it right away. |
|
I think there was a stale PR on this. Did you check if it's the same stuff? Would love to merge it once its cleared up and can release as part of next version later :) Don't think this could be a release by itself ahah. |
What
Adds a "Lower Volume Instead of Pausing" sub-option under the existing Pause Media During Transcription setting. When enabled, FluidVoice lowers the system output volume while you dictate and restores it afterward, instead of fully stopping playback — handy for keeping a video audible but quiet during narration/dictation.
Why
Today the only choice is to fully pause media during transcription. Some users would rather keep media playing at a lower volume than have it stop and resume.
Changes
SystemAudioVolumeController(new): a small CoreAudio wrapper that reads/sets the default output device's volume (master element with a per-channel fallback).MediaPlaybackService: now either pauses or ducks based on the setting, recording which action it took so it reverts exactly what it applied. On restore it:SettingsStore+BackupService): addsduckMediaInsteadOfPausing(Bool) andduckMediaVolumeLevel(fraction of current volume, default 20%, clamped 5–100%), including backup/restore with backward-compatible decoding.SettingsView): nested toggle + level slider, shown only when media pausing is enabled.Notes / trade-offs
MediaRemoteAdapterfor playback detection). Intel remains a no-op.Testing
Verified the new CoreAudio file with
swiftc -typecheckagainst the macOS SDK, andswiftc -parseon all edited files (clean). I was able to run the application using my team id on a M1 Pro chip MacBook Pro and verified the functionalityType of Change