Make SilentProcessRunner.Execute async#1236
Conversation
Replaces the sync WaitForExit() with await WaitForExitAsync(cancel). The cancel token is passed directly so the existing cancel semantics are preserved: cancel firing throws OCE from the await and unwinds. DoOurBestToCleanUp continues to fire on cancel via cancel.Register exactly as it did in the sync version. Adds a net48 polyfill for WaitForExitAsync using Process.Exited + TaskCompletionSource. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ISilentProcessRunner, SilentProcessRunnerWrapper, and the SilentProcessRunnerExtended helpers now return Task<int> and call SilentProcessRunner.ExecuteCommandAsync directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CommandLineRunner.Execute is consumed by Octopus.Manager.Tentacle (a WPF app), so the public method stays sync. It now blocks on the new SilentProcessRunner.ExecuteCommandAsync via GetAwaiter().GetResult() — safe because the WPF callers dispatch through ThreadPool.QueueUserWorkItem, which has no synchronisation context. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RunningScript now awaits SilentProcessRunner.ExecuteCommandAsync through a new RunScriptAsync helper. The monitored-startup path also awaits the async helper inside Task.Run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The six immediate sync callers of SilentProcessRunner now go through ExecuteCommandAsync(...).GetAwaiter().GetResult(): - Octopus.Manager.Tentacle PowerShellPrerequisite (WPF installer) - KubernetesDirectoryInformationProvider (IMemoryCache factory) - SystemCtlHelper (2 sites — start and sudo retry) - LinuxServiceConfigurator (3 sites — chmod, systemctl probe, sudo probe) - WindowsServiceConfigurator (sc.exe wrapper) Each site gets a comment explaining why it must be sync and why blocking on a thread-pool worker is deadlock-safe. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Updates Kubernetes integration test setup helpers, PowerShell startup-detection tests, integration support, and Linux test fixtures to call the new async ExecuteCommandAsync API. Tests that don't await directly (NUnit static helpers, cache factories) block on .GetAwaiter().GetResult() and document why it's deadlock-safe on the test threadpool. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rewrites the six sync-boundary comments to name the canonical pattern
("sync-over-async"), link Stephen Cleary's "Don't Block on Async Code"
reference, and keep the "we are here / we do this / safe because"
structure. Removes em-dashes per style preference.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| process.BeginErrorReadLine(); | ||
|
|
||
| process.WaitForExit(); | ||
| #if NETFRAMEWORK |
There was a problem hiding this comment.
Remove all changes from this method implementation, this should be a signature change only in this class
| } | ||
| } | ||
|
|
||
| #if NETFRAMEWORK |
There was a problem hiding this comment.
Remove this and put it in the next PR
|
|
||
| var exitCode = SilentProcessRunner.ExecuteCommand( | ||
| // Dispose() cannot be made async; .GetAwaiter().GetResult() is safe here | ||
| // because this runs in test teardown (not inside an async context with a sync-blocking SynchronizationContext). |
There was a problem hiding this comment.
This doesn't match our lighthouse example comment on the wpf installer. Specifically where it calls out the pattern and links the blog post
| public void Dispose() | ||
| { | ||
| var exitCode = SilentProcessRunner.ExecuteCommand( | ||
| // Dispose() cannot be made async; .GetAwaiter().GetResult() is safe here |
There was a problem hiding this comment.
This also doesn't follow our lighthouse comment example
|
|
||
| var chmodCmd = new CommandLineInvocation("/bin/bash", $"-c \"chmod 777 {scriptPath}\""); | ||
| chmodCmd.ExecuteCommand(); | ||
| // Safe: sync test helper, no synchronisation context. |
There was a problem hiding this comment.
Does not follow our lighthouse comment example
There was a problem hiding this comment.
Also there are many instances of this in this class. Can we create a helper method to dry it up or push this sync to async transition higher?
| log, | ||
| CancellationToken.None); | ||
| cancel: CancellationToken.None) | ||
| // Safe: static void helper, no synchronisation context. |
There was a problem hiding this comment.
Does not follow our lighthouse comment example
| { | ||
| var commandLineInvocation = new CommandLineInvocation("/bin/bash", arguments); | ||
| var result = commandLineInvocation.ExecuteCommand(); | ||
| // Safe: constructor-time helper, no synchronisation context. |
There was a problem hiding this comment.
Does not follow our lighthouse comment example
| } | ||
|
|
||
| // Sync-over-async is safe here: NUnit runs tests on a plain ThreadPool thread with no | ||
| // synchronisation context, so there is no risk of deadlock. |
There was a problem hiding this comment.
Does not follow our lighthouse comment example and is not at the actual call site
| var result = commandLineInvocation.ExecuteCommand(); | ||
| // We're in WriteUnitFile, called from IServiceConfigurator.ConfigureService implementations | ||
| // which are sync (called from the Tentacle service-management CLI), so we block on the | ||
| // async call with .GetAwaiter().GetResult(). |
There was a problem hiding this comment.
ITs not clear why these things have to be sync?
| // We're in WriteUnitFile, called from IServiceConfigurator.ConfigureService implementations | ||
| // which are sync (called from the Tentacle service-management CLI), so we block on the | ||
| // async call with .GetAwaiter().GetResult(). | ||
| // This is sync-over-async but is safe because the CLI dispatches us on a plain |
There was a problem hiding this comment.
How do we know the cli dispatches us on a thread pool worker?
| var commandLineInvocation = new CommandLineInvocation("/bin/bash", "-c \"command -v systemctl >/dev/null\""); | ||
| var result = commandLineInvocation.ExecuteCommand(); | ||
| // Same sync-over-async boundary as WriteUnitFile: CheckSystemPrerequisites is called | ||
| // from the sync ConfigureService path, on a plain thread-pool worker. |
There was a problem hiding this comment.
Does not follow our lighthouse comment example
| { | ||
| var commandLineInvocation = new CommandLineInvocation("/bin/bash", "-c \"sudo -vn 2> /dev/null\""); | ||
| var result = commandLineInvocation.ExecuteCommand(); | ||
| // Same sync-over-async boundary as IsSystemdInstalled. |
There was a problem hiding this comment.
Does not follow our lighthouse comment example
…, lighthouse comments, async tests - Strip SilentProcessRunner.cs to signature-only async change (move polyfill, EnableRaisingEvents, and await WaitForExitAsync to #1226 abandon PR) - Apply lighthouse comment pattern uniformly to all sync-over-async sites including test scaffolding and the second/third sites in LinuxServiceConfigurator - Improve LinuxServiceConfigurator justification: name the sync interface chain (IServiceConfigurator -> AbstractCommand -> ICommand -> Topshelf) and the no-SynchronizationContext property of Main/Topshelf worker threads - Move SilentProcessRunnerFixture sync-over-async comment to actual call site - Convert LinuxConfigureServiceHelperFixture test methods to async Task, eliminating sync-over-async entirely in that file Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| // synchronously, so we block on the async call with .GetAwaiter().GetResult(). | ||
| // This is sync-over-async but is safe because the installer dispatches us on a | ||
| // plain thread-pool worker. No captured SynchronizationContext, so no deadlock. | ||
| // See https://blog.stephencleary.com/2012/07/dont-block-on-async-code.html |
There was a problem hiding this comment.
@LukeButters do you have enough knowledge of this repo to verify Claude claims on each of these? Or shall I organise us a sync to go through them one by one
There was a problem hiding this comment.
Maybe @sburmanoctopus knows about this, he has done UI programming.
There was a problem hiding this comment.
From my WPF days, blocking would cause the UI to freeze, which wasn't great.
When async/await came along, it was amazing, because you could do async stuff as long as you added .ConfigureAwait(true) to the end. This would ensure the logic returned to the calling thread, which was the UI thread. This freed up the UI thread for other things while async code was running, and didn't block the UI thread anymore!
So by default, I would suggest using .ConfigureAwait(true).
However, in saying that, I have no context on where this code is being called, how it's being called etc. It's possible something else is dealing with the thread locking issue, or this is being called from a place where we must block the UI thread.
There was a problem hiding this comment.
Its being called from Winow.Start via DispatchHelper.Background here which queues it via Threadpool.QueueUserWorkItem which I think means that it is not on the UI thread and because Threadpool.QueueUserWorkItem isn't adding a synchronisation context we aren't suceptible to blocking?
I'd love it if someone could double check my thinking here. Claude can you give your 2c as well?
There was a problem hiding this comment.
Ah yes.
Yeah, the Start method is being called by the Loaded event, which is on the UI thread. But like you said, since the Start method calls ThreadPool.QueueUserWorkItem, any work done there is on a thread from the thread pool (so not the UI thread).
In which case, all we would be doing here is blocking the thread from the thread pool. If there aren't many processes being run on the thread pool, I don't think this is an issue. Especially for a manager UI.
It would be nice to see if we could make that whole call chain async, but I'm not sure that's worth the effort.
There was a problem hiding this comment.
Yeah I tried. It runs into an issue when another host doesn't have an async main entry point. I could push it to there. But I think I'm happy with where I've gotten to so far
| // This is sync-over-async but is safe because the cache factory runs on a plain | ||
| // thread-pool worker. No captured SynchronizationContext, so no deadlock. | ||
| // See https://blog.stephencleary.com/2012/07/dont-block-on-async-code.html | ||
| var exitCode = silentProcessRunner.ExecuteCommandAsync("du", $"-s -B 1 {directoryPath}", "/", stdOut.Add, stdErr.Add) |
There was a problem hiding this comment.
@jimmyp Might want check with modern deployments on this one or push the async.
There was a problem hiding this comment.
Took the second option @LukeButters: pushed the async sibling up so the in-process async caller awaits directly.
The two consumers of this method had different async-ness:
KubernetesScriptPodCreator.CreateScriptContainerisasyncend-to-end. It used to call syncGetStorageInformation()→ syncGetPathUsedBytes()→ block on async. That hop is gone —CreateScriptContainernowawaits a newGetStorageInformationAsync()/GetPathUsedBytesAsync()pair.KubernetesPhysicalFileSystem.EnsureDiskHasEnoughFreeSpaceis theIOctopusFileSystemoverride and is sync because the interface is sync. The syncGetPathUsedBytesstill exists for this one consumer.
So we still have one sync-over-async on the file-system override path. Hoping you can do a sanity check on it @liam-mackie since you wrote the original — the relevant safety claim is:
The Kubernetes agent is a console process (
static int Main), so .NET doesn't install a defaultSynchronizationContext. I've also confirmed by binary inspection that Halibut 8.1.1633 (the version Tentacle uses) contains no references toSynchronizationContextorSetSynchronizationContext— its request-handler threads don't install one either. As long as nothing in the K8s-agent host startup (pod watcher, bootstrap wrappers, anything you'd recognise) installs one, the remaining.GetAwaiter().GetResult()on theEnsureDiskHasEnoughFreeSpacepath is deadlock-safe.
An example of what would make it unsafe (so you know what to look for): something like SynchronizationContext.SetSynchronizationContext(new SomeCustomContext()) early in Main, or a UI-style context being installed by a library before the file-system code runs. Anything custom there would mean await inside GetPathUsedBytesAsync could try to resume on a thread that's blocked on .GetResult() — classic deadlock.
Does anything in the agent startup do that? If not, I'll leave the sync wrapper as-is.
There was a problem hiding this comment.
Sorry @liam-mackie, Claude got enthusiastic and posted this before I was really ready to reach out to you. Are you the right person to talk to? If so great, but let me shape this up a bit better before bothering to engage with it
There was a problem hiding this comment.
I've got context here, and I'm probably the best person in this particular instance, but it's likely worth reaching out to the k8s requests channel for more information.
…daries Consolidates 9 sync-over-async sites into 5 single-method bridges, each of which is a tiny sync wrapper over a private async implementation. All internal helpers become fully async, removing the in-method GetAwaiter calls in SystemCtlHelper and LinuxServiceConfigurator entirely. Bridges: - PowerShellPrerequisite.Check - KubernetesDirectoryInformationProvider.GetPathUsedBytes - LinuxServiceConfigurator.ConfigureService (replaces 5 prior bridges) - WindowsServiceConfigurator.ConfigureService - CommandLineRunner.Execute (also exposes ExecuteAsync for async callers) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address PR review on #1236: the sync-over-async hop at the IKubernetesDirectoryInformationProvider boundary was carrying two consumers — a sync override (EnsureDiskHasEnoughFreeSpace) and an async one (CreateScriptContainer). Expose async siblings (GetPathUsedBytesAsync on the provider, GetStorageInformationAsync on KubernetesPhysicalFileSystem) so the async caller awaits directly. The sync GetPathUsedBytes and GetStorageInformation remain for the IOctopusFileSystem override, and the lighthouse comment now names that exact consumer instead of waving at a non-existent background sweeper.
|
|
||
| Action<string> log = s => logger.Information(s); | ||
| var exitCode = SilentProcessRunner.ExecuteCommand( | ||
| // We're in a synchronous public static helper (ExtractTarGzip). The method |
There was a problem hiding this comment.
We need to talk about where this is called from and why this method needs to be sync
| var exitCode = SilentProcessRunner.ExecuteCommand( | ||
|
|
||
| // We're in a synchronous test helper (Execute) that exposes a sync int | ||
| // return and out parameters. The method must return synchronously, so we |
There was a problem hiding this comment.
Why must this method return synchronously?
| this.directoryInformationCache = directoryInformationCache; | ||
| } | ||
|
|
||
| // Sync-over-async bridge for the one remaining sync caller: KubernetesPhysicalFileSystem |
There was a problem hiding this comment.
This requires context of our conversation to understand. Make this match the same format as the other justifications and make it clear and succinct
| serviceConfigurationState); | ||
| } | ||
|
|
||
| // We're at the IServiceConfigurator boundary. IServiceConfigurator is consumed by |
There was a problem hiding this comment.
Apply the same feedback here as we gave on the linux service configurator
Address PR review on #1236: - Plainer two-section "Why this is sync / Why blocking is safe" format across all sync-over-async sites; tests add a "Why low risk" line per reviewer ask. - KubernetesDirectoryInformationProvider, PowerShellPrerequisite, Linux/WindowsServiceConfigurator, KubernetesAgentInstaller, SilentProcessRunnerFixture, LinuxTestUserPrincipal (previously had no justification, now does). - LinuxTentacleFetcher.ExtractTarGzip: both callers are already async, so flip the helper to ExtractTarGzipAsync and remove the sync-over-async entirely. Update NugetTentacleFetcher.ExtractTentacle to await. - Fix WhenSettingUpPollingTentacle_TelemetryEventShouldBeSent: builder was stubbing only the sync Execute overloads, so the ExecuteAsync switch in ReviewAndRunScriptTabViewModel returned a default false Task and the telemetry callback never fired. Stub both overloads.
| // deadlock when the calling thread has a SynchronizationContext. The | ||
| // Kubernetes agent is a console app and doesn't set one up, so there's | ||
| // nothing for the awaited continuation to wait on. | ||
| // See https://blog.stephencleary.com/2012/07/dont-block-on-async-code.html |
There was a problem hiding this comment.
@liam-mackie I think this is the only place I need your eyes.
This change is because we are making something async lower down the call stack to help make it easier to cancel/abandon when a script it not behaving. Most tentacle code is sync so we end up blocking on async code which can cause issues. I think the comment above justifies that this instance below wont cause any issues, but let me know if you think the rationale misses something?
There was a problem hiding this comment.
That sounds about right to me - I don't think the agent has anything that would add a SynchronisationContext - @APErebus just double checking you don't have anything to add here, since you have a bunch of experience here
Address PR review on #1236: the sync and async GetStorageInformation variants had duplicated body. Factor the bytesTotal lookup + tuple assembly into a private sync helper so the sync/async pair we need stays DRY.
SilentProcessRunner.Execute async
Why
Tentacle needs to release the script-isolation mutex even when a script can't be cancelled. Philips is running into situations where CrowdStrike and Rapid7 contention freezes the script running on the Tentacle and our cancel signal can't shake it loose. The only recovery today is restart tentacle.
To make this so in a follow up PR
SilentProcessRunner.ExecuteCommandbecomes async. Which makes everything that calls it async too.This PR just adds the changes to method signatures that need to become async, to make the next riskier PR easier to review.
What changed
Replaces
SilentProcessRunner.ExecuteCommandwithExecuteCommandAsync, plus the matching interfaces (ISilentProcessRunner,ICommandLineRunner) and their wrappers. All ~20 callers in production, integration tests, and test scaffolding.There are 5 places we synchronously call these new async call paths, each documented well in code:
KubernetesDirectoryInformationProvider.GetPathUsedBytes(only called via anIOctopusFileSystemsync override)PowerShellPrerequisite.Check(IPrerequisiteis sync, used by the WPF installer)CommandLineRunner.Execute(sync wrapper for the WPF installer flow)LinuxServiceConfiguratorandWindowsServiceConfigurator(Topshelf'sServiceCommand → AbstractCommand → ICommand.Startchain is sync top to bottom)Risks & mitigations
The classic sync-over-async deadlock is the obvious worry here.
.GetAwaiter().GetResult()hangs when the calling thread has captured aSynchronizationContext. See each call site for why it is safe.Mock vs. production drift in tests was a surprising issue I ran intodue to places we mock
Executewe needed to switch this toExecuteAsync. We hit this onWhenSettingUpPollingTentacle_TelemetryEventShouldBeSentand the wizard model builder now stubs both.For Kubernetes agent owners
This PR touches your ownership area but should cause no behaviour changes.
KubernetesDirectoryInformationProvider: exposes sync (existing) and async (new) ways to read path usage. Sameduinvocation, same 30-second cache; we need both because one downstream is sync and another is async.KubernetesPhysicalFileSystem: same shape. SyncGetStorageInformationstays, new asyncGetStorageInformationAsyncadded. Both go through one private helper so the two never drift.KubernetesScriptPodCreator.CreateScriptContainer: previously blocked on async work indirectly. Nowawaits the new async sibling directly. No behaviour change, just one fewer sync-over-async hop on your script-pod creation path.KubernetesAgentInstaller.Dispose(integration test scaffolding only): still blocks on async becauseDisposeis sync.What we'd love your eyes on: there's one remaining sync-over-async hop in the K8s agent at
EnsureDiskHasEnoughFreeSpace, because it implements a sync filesystem interface. It blocks on the asyncdupath. This is safe in a normal .NET console app because there's noSynchronizationContextfor the awaited continuation to come back to. If you know of anything in the K8s agent's host startup (pod watchers, bootstrap, a custom task scheduler) that would install aSynchronizationContextand break that assumption, the disk-space check is where the deadlock would surface. Worth a quick sanity check against your mental model of how the agent boots up.How to review