support background steps#4416
Conversation
There was a problem hiding this comment.
Pull request overview
Adds first-class “background step” support to the runner by allowing action steps to run concurrently, introducing control steps (wait / wait-all / cancel), and surfacing related metadata to the Results service contracts.
Changes:
- Introduces new pipeline step types (Wait, WaitAll, Cancel) and runner step implementations to coordinate background execution.
- Extends
StepsRunnerto start background actions concurrently, provide wait/cancel semantics, and attempt to isolate per-step GitHub context. - Adds Results/RunService contract fields and L0 tests to validate concurrency and metadata propagation.
Show a summary per file
| File | Description |
|---|---|
| src/Test/L0/Worker/BackgroundStepsL0.cs | Adds L0 coverage for concurrent background steps, waits, cancels, and steps-context thread-safety. |
| src/Sdk/WebApi/WebApi/ResultsHttpClient.cs | Adds mapping of timeline record variables into workflow step payloads (but also includes debug file logging). |
| src/Sdk/WebApi/WebApi/Contracts.cs | Extends Results service Step contract with background/control-step metadata DTOs. |
| src/Sdk/RSWebApi/Contracts/StepResult.cs | Extends RunService step result contract with background/control-step metadata fields. |
| src/Sdk/DTPipelines/Pipelines/WaitStep.cs | Adds pipeline model for a “wait” step. |
| src/Sdk/DTPipelines/Pipelines/WaitAllStep.cs | Adds pipeline model for a “wait-all” step. |
| src/Sdk/DTPipelines/Pipelines/CancelStep.cs | Adds pipeline model for a “cancel” step. |
| src/Sdk/DTPipelines/Pipelines/StepConverter.cs | Enables JSON deserialization for new step types. |
| src/Sdk/DTPipelines/Pipelines/Step.cs | Registers new known step types and extends the StepType enum. |
| src/Sdk/DTPipelines/Pipelines/ActionStep.cs | Adds Background flag and ensures it’s cloned. |
| src/Runner.Worker/WaitStepRunner.cs | Adds runner step type for “wait” control step. |
| src/Runner.Worker/WaitAllStepRunner.cs | Adds runner step type for “wait-all” control step. |
| src/Runner.Worker/CancelStepRunner.cs | Adds runner step type for “cancel” control step. |
| src/Runner.Worker/StepsRunner.cs | Implements background execution, wait/wait-all/cancel handling, slot limiting, and GitHubContext isolation. |
| src/Runner.Worker/StepsContext.cs | Adds locking around step output/outcome/conclusion mutations. |
| src/Runner.Worker/JobExtension.cs | Wires new pipeline step types into job initialization and sets timeline variables for results metadata. |
| src/Runner.Worker/ExecutionContext.cs | Adds timeline-record variable setter and maps those variables into StepResult; adjusts template evaluator feature gating. |
| src/Runner.Worker/BackgroundStepContext.cs | Adds per-background-step tracking (task, CTS, result, external id). |
Copilot's findings
Comments suppressed due to low confidence (4)
src/Sdk/WebApi/WebApi/ResultsHttpClient.cs:568
File.AppendAllText("/tmp/bg-steps-debug.log", ...)is executed without a try/catch here. If the path is unavailable (e.g., Windows runner, locked filesystem, permissions), it will throw and can break step updates. Remove this statement or guard it behind safe, platform-appropriate diagnostics.
System.IO.File.AppendAllText("/tmp/bg-steps-debug.log", $"[BG-DEBUG] Result: name={step.Name}, isBackground={step.IsBackground}, stepType={step.StepType}\n");
return step;
src/Sdk/WebApi/WebApi/ResultsHttpClient.cs:644
- Serializing and logging full
StepsUpdateRequestJSON payloads to/tmpcan expose tokens/PII (steps metadata can contain user-controlled values) and adds unbounded I/O in a hot path. Please remove this debug logging or route it through existing trace logging with appropriate redaction and opt-in controls.
// DEBUG: Serialize and log the JSON payload
try
{
var json = Newtonsoft.Json.JsonConvert.SerializeObject(request, Newtonsoft.Json.Formatting.None);
System.IO.File.AppendAllText("/tmp/bg-steps-debug.log", $"[BG-DEBUG] JSON payload: {json}\n");
}
catch (Exception ex)
{
System.IO.File.AppendAllText("/tmp/bg-steps-debug.log", $"[BG-DEBUG] Serialize error: {ex.Message}\n");
}
src/Runner.Worker/StepsRunner.cs:605
- In
HandleWaitAsync, the localcompletedis assigned but never used. With TreatWarningsAsErrors enabled in this repo, this will fail the build. Remove the unused assignment (or use it if intended).
Trace.Info($"Waiting for {tasks.Count} background step(s)...");
var cancelTask = Task.Delay(Timeout.Infinite, cancellationToken);
var completed = await Task.WhenAny(Task.WhenAll(tasks), cancelTask);
if (cancellationToken.IsCancellationRequested)
{
src/Runner.Worker/JobExtension.cs:494
- This sets
cancel_step_idon the cancel step timeline record to the logical step id, but later logic expects to publish the background step's external/timeline id. IfStepsRunnercan't resolve the id, this logical value will be reported upstream. Consider deferring this variable until you can resolve the external id (or always overwrite/clear it during execution).
cancelRunner.ExecutionContext.SetTimelineRecordVariable("step_type", "cancel");
if (!string.IsNullOrEmpty(cancelRunner.CancelStepId))
{
cancelRunner.ExecutionContext.SetTimelineRecordVariable("cancel_step_id", cancelRunner.CancelStepId);
}
- Files reviewed: 18/18 changed files
- Comments generated: 14
| // DEBUG: Log all variables on this timeline record to a file | ||
| try | ||
| { | ||
| var debugLine = $"[BG-DEBUG] ConvertTimelineRecordToStep: name={r.Name}, id={r.Id}, variableCount={r.Variables.Count}"; | ||
| foreach (var kvp in r.Variables) | ||
| { | ||
| debugLine += $"\n Variable: {kvp.Key}={kvp.Value.Value}"; | ||
| } | ||
| System.IO.File.AppendAllText("/tmp/bg-steps-debug.log", debugLine + "\n"); | ||
| } | ||
| catch { } |
| // Track active background steps | ||
| private readonly ConcurrentDictionary<string, BackgroundStepContext> _backgroundSteps = new(); | ||
| private readonly HashSet<string> _waitedStepIds = new(); | ||
| private readonly SemaphoreSlim _backgroundSlotSemaphore = new(10); // max 10 concurrent background steps | ||
|
|
||
| // StepsRunner should never throw exception to caller | ||
| public async Task RunAsync(IExecutionContext jobContext) | ||
| { |
| // Wait for all to finish with a grace period | ||
| var gracePeriod = TimeSpan.FromSeconds(7.5); | ||
| var allTasks = activeSteps.Select(bg => bg.ExecutionTask).ToArray(); | ||
| await Task.WhenAny(Task.WhenAll(allTasks), Task.Delay(gracePeriod)); | ||
|
|
||
| var stillRunning = activeSteps.Where(bg => !bg.IsCompleted).ToList(); | ||
| if (stillRunning.Count > 0) | ||
| { | ||
| Trace.Warning($"{stillRunning.Count} background step(s) did not terminate gracefully."); | ||
| } | ||
|
|
||
| // Final wait for all tasks to complete | ||
| await Task.WhenAll(allTasks); | ||
| } |
| // Wait for grace period (7.5 seconds) | ||
| var gracePeriod = TimeSpan.FromSeconds(7.5); | ||
| await Task.WhenAny(bgCtx.ExecutionTask, Task.Delay(gracePeriod)); | ||
|
|
||
| if (!bgCtx.IsCompleted) | ||
| { | ||
| Trace.Warning($"Background step '{cancelStep.CancelStepId}' did not terminate gracefully after {gracePeriod.TotalSeconds}s."); | ||
| } | ||
| } | ||
|
|
||
| await bgCtx.ExecutionTask; | ||
| Trace.Info($"Background step '{cancelStep.CancelStepId}' cancelled/completed."); |
| if (externalIds.Count > 0) | ||
| { | ||
| executionContext.SetTimelineRecordVariable("wait_step_ids", string.Join(",", externalIds)); | ||
| } | ||
| } | ||
|
|
||
| private void SetCancelStepIdTimelineVariable(IExecutionContext executionContext, string logicalStepId) | ||
| { | ||
| var externalId = GetBackgroundExternalId(logicalStepId); | ||
| if (!string.IsNullOrEmpty(externalId)) | ||
| { | ||
| executionContext.SetTimelineRecordVariable("cancel_step_id", externalId); | ||
| } |
| using System; | ||
| using System.ComponentModel; | ||
| using System.Runtime.Serialization; | ||
| using GitHub.DistributedTask.ObjectTemplating.Tokens; | ||
| using Newtonsoft.Json; |
| using System; | ||
| using System.ComponentModel; | ||
| using System.Runtime.Serialization; | ||
| using GitHub.DistributedTask.ObjectTemplating.Tokens; | ||
| using Newtonsoft.Json; |
| @@ -0,0 +1,27 @@ | |||
| using System; | |||
| using System.Collections.Concurrent; | |||
| // Arrange: background step that runs until cancelled | ||
| var bgCts = new CancellationTokenSource(); | ||
|
|
||
| var bgStep = CreateStep(hc, TaskResult.Succeeded, "success()", name: "server", contextName: "server"); | ||
| bgStep.Setup(x => x.RunAsync()).Returns(async () => | ||
| { | ||
| try | ||
| { | ||
| await Task.Delay(TimeSpan.FromSeconds(30), bgCts.Token); | ||
| } | ||
| catch (OperationCanceledException) | ||
| { | ||
| throw; | ||
| } | ||
| }); |
| @@ -1440,7 +1463,6 @@ public static IPipelineTemplateEvaluator ToPipelineTemplateEvaluator(this IExecu | |||
| return new PipelineTemplateEvaluator(traceWriter, schema, context.Global.FileTable) | |||
| { | |||
| MaxErrorMessageLength = int.MaxValue, // Don't truncate error messages otherwise we might not scrub secrets correctly | |||
| public sealed class StepsRunner : RunnerService, IStepsRunner | ||
| { | ||
| // Track active background steps | ||
| private readonly ConcurrentDictionary<string, BackgroundStepContext> _backgroundSteps = new(); |
There was a problem hiding this comment.
Does this need to be a ConcurrentDictionary?
|
|
||
| // Implicit wait-all before post-job hooks: | ||
| // If any background steps haven't been waited on, inject a visible wait-all step. | ||
| if (_backgroundSteps.Count > 0) |
There was a problem hiding this comment.
I'm wondering if this is something that should be done in "Set up job" instead.
That's where all the pre/post/etc stuff is registered today.
| actionStep.ExecutionContext = jobContext.CreateChild(actionStep.Action.Id, actionStep.DisplayName, actionStep.Action.Name, null, actionStep.Action.ContextName, ActionRunStage.Main, intraActionState); | ||
|
|
||
| // Store background step metadata on the timeline record for results service | ||
| if (actionStep.Action?.Background == true) |
There was a problem hiding this comment.
I'm wondering whether pre/post should not follow background
That is:
- pre-stage would be sequential
- main-stage could have background
- implicit wait at the end of main-stage
- post-stage is sequential
Otherwise might need implicit waits at the end of each stage, which might be fine too.
| var externalIds = GetBackgroundExternalIds(logicalStepIds); | ||
| if (externalIds.Count > 0) | ||
| { | ||
| executionContext.SetTimelineRecordVariable("wait_step_ids", string.Join(",", externalIds)); |
There was a problem hiding this comment.
Per copilot:
The same key (wait_step_ids / cancel_step_id) is written twice with different value semantics:
- Setup-time write: comma-separated logical ids (from waitStep.WaitStepIds).
- Runtime overwrite: comma-separated external GUIDs (from BackgroundStepContext.ExternalId).
Both are queued to the results service. The consumer sees the logical-ids version first, then sees them replaced with GUIDs later. That's confusing and probably unintentional.
|
|
||
| // Start | ||
| step.ExecutionContext.Start(); | ||
| bool isBackground = false; |
There was a problem hiding this comment.
nit: consider:
var isBackground = (step as IActionRunner)?.Action?.Background || false;
| var externalIds = GetBackgroundExternalIds(logicalStepIds); | ||
| if (externalIds.Count > 0) | ||
| { | ||
| executionContext.SetTimelineRecordVariable("wait_step_ids", string.Join(",", externalIds)); |
There was a problem hiding this comment.
I'm wondering if all the timeline record variable stuff can be contained to "Set up job". Per copilot, all the external IDs are known during "set up job".
There was a problem hiding this comment.
Also curious @TingluoHuang's thoughts on timeline record variable approach in general.
Probably makes sense, but not something I normally think deeply about.
There was a problem hiding this comment.
I'm wondering whether we should have first class properties instead of timeline record variables, which iiuc we don't otherwise use today.
| public CancellationTokenSource Cts { get; set; } | ||
| public GitHub.DistributedTask.WebApi.TaskResult? Result { get; set; } | ||
| public bool IsCompleted => ExecutionTask?.IsCompleted ?? false; | ||
| public string ExternalId => Step.ExecutionContext == null || Step.ExecutionContext.Id == Guid.Empty ? null : Step.ExecutionContext.Id.ToString("N"); |
There was a problem hiding this comment.
I'm wondering if Step.ExecutionContext.Id is ever null or empty?
iiuc external IDs are known during "Set up job"
| var stepOrder = 0; | ||
| foreach (var step in message.Steps) | ||
| { | ||
| stepOrder++; |
| var outputs = step["outputs"].AssertDictionary("outputs"); | ||
| outputs[outputName] = new StringContextData(value); | ||
| if (_propertyRegex.IsMatch(outputName)) | ||
| lock (_lock) |
There was a problem hiding this comment.
I'm thinking background step outputs shouldn't be available to foreground steps before the wait synchronization point
| var stepUpdateEndpoint = new Uri(m_resultsServiceUrl, Constants.WorkflowStepsUpdate); | ||
| foreach (var request in stepUpdateRequests) | ||
| { | ||
| // DEBUG: Serialize and log the JSON payload |
There was a problem hiding this comment.
it looks like this should be deleted
|
|
||
| try | ||
| { | ||
| System.IO.File.AppendAllText("/tmp/bg-steps-debug.log", $"[BG-DEBUG] UpdateWorkflowStepsAsync: {stepRecords.Count} task records\n"); |
| { | ||
| debugLine += $"\n Variable: {kvp.Key}={kvp.Value.Value}"; | ||
| } | ||
| System.IO.File.AppendAllText("/tmp/bg-steps-debug.log", debugLine + "\n"); |
|
|
||
| [DataContract] | ||
| [JsonObject(NamingStrategyType = typeof(SnakeCaseNamingStrategy))] | ||
| public class WaitControlDto |
Summary
Adds runner support for background steps - enabling concurrent step execution within a single GitHub Actions job. This introduces four new workflow YAML keywords:
background: true,wait,wait-all, andcancel.Motivation
Today, all steps in a job run sequentially. Common patterns like "start a dev server, run tests, stop the server" or "upload artifacts while tests continue" require workarounds (
&backgrounding, service containers). Background steps provide first-class support for concurrent step execution with explicit synchronization primitives.What's Changed
New step types (SDK layer)
WaitStep,WaitAllStep,CancelStep— new pipeline step types with JSON deserialization supportActionStep.Background— boolean property indicating a step should run asynchronouslyCore execution engine (
StepsRunner.cs)Task.Runand execute concurrently with subsequent foreground stepswait: <id>blocks until specific background step(s) completewait-all: trueblocks until all prior background steps completecancel: <id>sends SIGTERM with a 7.5s grace periodSemaphoreSlimwait-allinjected before post-job hooks if any background steps haven't been explicitly waited on (visible in UI as a timeline entry)Thread safety (
StepsContext.cs)lockaroundSetOutput,SetConclusion,SetOutcome— background steps write to the shared steps context from thread pool threads