Skip to content

feat(media): add option to duck media volume instead of pausing#459

Open
akar016012 wants to merge 7 commits into
altic-dev:mainfrom
akar016012:feature/duck-media-volume
Open

feat(media): add option to duck media volume instead of pausing#459
akar016012 wants to merge 7 commits into
altic-dev:mainfrom
akar016012:feature/duck-media-volume

Conversation

@akar016012

@akar016012 akar016012 commented Jun 28, 2026

Copy link
Copy Markdown

What

Adds a "Lower Volume Instead of Pausing" sub-option under the existing Pause Media During Transcription setting. When enabled, FluidVoice lowers the system output volume while you dictate and restores it afterward, instead of fully stopping playback — handy for keeping a video audible but quiet during narration/dictation.

Why

Today the only choice is to fully pause media during transcription. Some users would rather keep media playing at a lower volume than have it stop and resume.

Changes

  • SystemAudioVolumeController (new): a small CoreAudio wrapper that reads/sets the default output device's volume (master element with a per-channel fallback).
  • MediaPlaybackService: now either pauses or ducks based on the setting, recording which action it took so it reverts exactly what it applied. On restore it:
    • leaves the volume untouched if the user changed it mid-dictation, and
    • falls back to pausing if the volume can't be lowered.
  • Settings (SettingsStore + BackupService): adds duckMediaInsteadOfPausing (Bool) and duckMediaVolumeLevel (fraction of current volume, default 20%, clamped 5–100%), including backup/restore with backward-compatible decoding.
  • UI (SettingsView): nested toggle + level slider, shown only when media pausing is enabled.

Notes / trade-offs

  • CoreAudio output volume is system-wide, so ducking lowers all output from the default device, not just a single app. macOS does not expose per-app media ducking to third-party apps. The UI copy makes this explicit.
  • Behavior is arm64-only, consistent with the existing pause feature (which relies on MediaRemoteAdapter for playback detection). Intel remains a no-op.
  • No entitlement changes required (app is not sandboxed).

Testing

Verified the new CoreAudio file with swiftc -typecheck against the macOS SDK, and swiftc -parse on all edited files (clean). I was able to run the application using my team id on a M1 Pro chip MacBook Pro and verified the functionality

Type of Change

  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 🧹 Chore
  • 📝 Documentation update
image

Add a "Lower Volume Instead of Pausing" sub-option to "Pause Media During
Transcription". When enabled, FluidVoice lowers the system output volume
while you dictate and restores it afterward, instead of fully stopping
playback — useful for keeping a video audible but quiet during narration.

- Add SystemAudioVolumeController, a CoreAudio wrapper that reads/sets the
  default output device volume (master element with per-channel fallback).
- Extend MediaPlaybackService to either pause or duck based on the setting,
  tracking which action was taken so it reverts exactly what it applied. On
  restore it leaves the volume untouched if the user changed it mid-dictation,
  and falls back to pausing if the volume can't be lowered.
- Add duckMediaInsteadOfPausing and duckMediaVolumeLevel settings (level
  defaults to 20%, clamped 5–100%), including backup/restore support.
- Add the nested toggle and a level slider to Settings, shown only when
  media pausing is enabled.

Note: CoreAudio output volume is system-wide, so ducking lowers all output
from the default device, not just a single app. Behavior is arm64-only,
consistent with the existing pause feature.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0618a4e76d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread Sources/Fluid/Services/MediaPlaybackService.swift Outdated
…backs

MediaRemoteAdapter can fire the getTrackInfo callback more than once (the
existing resumeOnce one-shot already guards the continuation against this).
The duck side effect, however, ran on every callback before the gate. Since
ducking is not idempotent — it reads the current output volume as the
"original" — a duplicate callback re-ducked the already-lowered volume and
overwrote activeSuppression with the ducked value, so the later restore only
returned to the ducked level instead of the user's original volume.

Route applySuppression() through resumeOnce's one-shot gate so it runs exactly
once, for the winning callback, before the continuation resumes. Pause mode was
unaffected (pause() is idempotent); this only mattered for ducking.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 32457dc15b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread Sources/Fluid/Services/SystemAudioVolumeController.swift Outdated
On output devices with no settable master volume, the controller fell back
to per-channel writes while reading volume as the average of the channels.
Ducking then captured that average as the "original" and, on restore, wrote
it back to every channel — permanently flattening a user's non-centered
left/right balance after a dictation session (and it did not self-heal).

Replace the scalar get/set API with an OutputVolumeSnapshot that records each
element (master or individual channels) and its level:
- capture the full per-channel state before ducking,
- duck by scaling every channel by the same factor (preserving balance),
- restore each channel to its captured value.

The snapshot also carries the device id, so restore and the "did the user
change it?" check operate on the device that was actually ducked even if the
default output device changes mid-dictation.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d421045fb6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread Sources/Fluid/Services/MediaPlaybackService.swift
TranscriptionSoundPlayer's independent-volume mode also writes the system
output volume and schedules a fire-and-forget async restore. That made it a
second, uncoordinated owner of the same volume the new ducking feature
controls, with two race outcomes:

- the start cue (played just before asr.start()) sets system volume and the
  duck then captures that transient level as the "original"; the cue's late
  restore undoes the duck mid-session;
- the stop cue (played while ducked) saves the ducked level and, if its
  restore fires after resumeIfWePaused, overwrites the final restore and
  leaves the Mac stuck at the ducked level.

When ducking is the active media behavior (pause + lower-volume both on), it
now owns the system volume: the cue plays at its own AVAudioPlayer volume and
no longer hijacks/restores system volume, eliminating the race. Independent
cue volume is unaffected when ducking is off.
…ked device

Two edge-case fixes found in self-review of the ducking path:

- captureOutputVolume() preferred the master element whenever it was *readable*,
  but apply() requires it to be *settable*. On a device with a read-only master
  and settable per-channel volumes, ducking would capture the master, fail to
  apply, and needlessly fall back to pausing. Capture now records only settable
  elements (master if settable, otherwise the settable channels), keeping capture
  and restore symmetric.

- Replaced the post-duck re-capture and the restore-time change check, which
  re-resolved the *default* output device, with reread(snapshot:) that reads the
  snapshot's own device/elements. This keeps the quantization read-back accurate
  and is unaffected if the default output device changes mid-session.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ef88004076

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread Sources/Fluid/Services/SystemAudioVolumeController.swift Outdated
Comment thread Sources/Fluid/Services/TranscriptionSoundPlayer.swift Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 16d4ebe173

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread Sources/Fluid/Services/MediaPlaybackService.swift
The ducking-precedence guard was keyed on settings alone, so it also forced
independent cue volume off for Settings previews — which never run during a
dictation session and have no ducking conflict to avoid.

Scope the precedence to the actual session start/stop cues via an
enforceDuckingPrecedence flag; previews pass false and always honor the
independent-volume setting. Session cues still defer to ducking, since the
start cue fires before playback state is known.
stop() sets isRunning=false before its final transcription pass and only
reverts media afterwards, so a new dictation can start during that window.
With a single shared activeSuppression, the second session would capture the
already-ducked volume as its "original," the first session's revert would
then clear the newer snapshot, and the Mac could be left stuck at the ducked
level.

pauseIfPlaying() now bails out (returning false, "no new action") when a
suppression is already active, so the in-flight revert from the prior session
remains the sole owner of the system volume and restores the true original.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9169d9f175

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread Sources/Fluid/Services/MediaPlaybackService.swift
@akar016012

Copy link
Copy Markdown
Author

@altic-dev Could you take a look at this PR and approve if you want to go ahead with this feature? Please release a new version so that people can start testing or maybe release a pre-release version so that people can start testing and let me know if there are any issues and I'll work on it right away.

@altic-dev

Copy link
Copy Markdown
Owner

I think there was a stale PR on this. Did you check if it's the same stuff? Would love to merge it once its cleared up and can release as part of next version later :)

Don't think this could be a release by itself ahah.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants