Add adjustable tail recording to capture the last word#426
Add adjustable tail recording to capture the last word#426EliseiNicolae wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d293442db6
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
04b24d7 to
16b530d
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 16b530d9e8
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
16b530d to
ab959d0
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ab959d0b95
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Amazing job making it configurable. Because I don't wanna ship it by default as people would mostly complain about how slow it is compared to other apps. So making it configurable helps deal with getting the last word I guess if needed. I wonder if we can look into optimizing coreAudio to capture it without the 100ms delay in the first place. Also, Nice demo. Shows it off immediately |
|
Okay, I did do some understanding of this issue and I feel like there are ways to optimize this better rather than patching this using an extra delay. I think it should capture everything automatically given you can add a 20 to 50 millisecond delay to capture everything as optional. So, this way, the user wouldn't have to worry about what to do or what timing to put, and things like that. Look into time-stamping of core audio frames and process based on that, it could be a better way to do this. If that doesn't really work or adds too much latency, then we can go forward with merging this |
Description
Dictation was clipping the last word. When you trigger stop, the final ~50–130 ms of audio is still in CoreAudio's input pipeline (and people tend to release the key a hair early), but capture freezes immediately — so the tail of the last word is lost.
This keeps the mic tap live for a short, configurable grace period after the stop trigger, so the trailing audio lands in the buffer before the engine tears down. The duration is exposed as a slider in Settings.
Type of Change
Related Issues
What changed
ASRService.stop()nowawaits a short grace period beforeaudioCapturePipeline.setRecordingEnabled(false), so the tap keeps appending the trailing audio. The exact cessation point (the tap'sguard enabledcheck) is why the delay has to go before that line — anything after it is already discarded.isStoppingflag to close the re-entrancy window the delay opens: whilestop()is sleeping,isRunningis stilltrue, so the un-guarded playground/Welcome stop path could otherwise slip in a secondstop().SettingsStore.recordingTailDuration(seconds, default 0.2, clamped 0–0.4). Read live at stop time, so a change applies on the very next dictation.0disables the tail entirely (the getter uses anil-check so0persists instead of snapping back to the default).stopWithoutTranscription()) is intentionally left undelayed.Testing
swiftlint --strict --config .swiftlint.yml Sourcesswiftformat --config .swiftformat SourcesBuilt Release (arm64) successfully, signed, installed, and launched on Apple Silicon / macOS 26. End-to-end dictation tail behavior (and dragging the slider) should be confirmed manually, since it needs live mic input.
Notes
tail_has_audioguard, so finalize runs a full transcription instead of reusing the streaming preview — slightly slower finalize on that path, but that's exactly what recovers the last word.stop()is called, andisProcessingActivekeeps it visible whileisRunningflips a beat later.Screenshots / Video
The new control lives in Settings → Global Hotkey card → Options, directly under "Activation Mode" — label "Extra recording after stop", a 0.00–0.40 s slider showing the live value. Screenshot to be added.