Skip to content

support background steps#4416

Open
lokesh755 wants to merge 2 commits into
actions:mainfrom
lokesh755:lokesh755-background-steps
Open

support background steps#4416
lokesh755 wants to merge 2 commits into
actions:mainfrom
lokesh755:lokesh755-background-steps

Conversation

@lokesh755
Copy link
Copy Markdown
Contributor

@lokesh755 lokesh755 commented May 12, 2026

Summary

Adds runner support for background steps - enabling concurrent step execution within a single GitHub Actions job. This introduces four new workflow YAML keywords: background: true, wait, wait-all, and cancel.

Motivation

Today, all steps in a job run sequentially. Common patterns like "start a dev server, run tests, stop the server" or "upload artifacts while tests continue" require workarounds (& backgrounding, service containers). Background steps provide first-class support for concurrent step execution with explicit synchronization primitives.

What's Changed

New step types (SDK layer)

  • WaitStep, WaitAllStep, CancelStep — new pipeline step types with JSON deserialization support
  • ActionStep.Background — boolean property indicating a step should run asynchronously

Core execution engine (StepsRunner.cs)

  • Background steps launch via Task.Run and execute concurrently with subsequent foreground steps
  • wait: <id> blocks until specific background step(s) complete
  • wait-all: true blocks until all prior background steps complete
  • cancel: <id> sends SIGTERM with a 7.5s grace period
  • Concurrency limited to 10 concurrent background steps via SemaphoreSlim
  • Implicit wait-all injected before post-job hooks if any background steps haven't been explicitly waited on (visible in UI as a timeline entry)

Thread safety (StepsContext.cs)

  • Added lock around SetOutput, SetConclusion, SetOutcome — background steps write to the shared steps context from thread pool threads

Copilot AI review requested due to automatic review settings May 12, 2026 14:25
@lokesh755 lokesh755 requested a review from a team as a code owner May 12, 2026 14:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-class “background step” support to the runner by allowing action steps to run concurrently, introducing control steps (wait / wait-all / cancel), and surfacing related metadata to the Results service contracts.

Changes:

  • Introduces new pipeline step types (Wait, WaitAll, Cancel) and runner step implementations to coordinate background execution.
  • Extends StepsRunner to start background actions concurrently, provide wait/cancel semantics, and attempt to isolate per-step GitHub context.
  • Adds Results/RunService contract fields and L0 tests to validate concurrency and metadata propagation.
Show a summary per file
File Description
src/Test/L0/Worker/BackgroundStepsL0.cs Adds L0 coverage for concurrent background steps, waits, cancels, and steps-context thread-safety.
src/Sdk/WebApi/WebApi/ResultsHttpClient.cs Adds mapping of timeline record variables into workflow step payloads (but also includes debug file logging).
src/Sdk/WebApi/WebApi/Contracts.cs Extends Results service Step contract with background/control-step metadata DTOs.
src/Sdk/RSWebApi/Contracts/StepResult.cs Extends RunService step result contract with background/control-step metadata fields.
src/Sdk/DTPipelines/Pipelines/WaitStep.cs Adds pipeline model for a “wait” step.
src/Sdk/DTPipelines/Pipelines/WaitAllStep.cs Adds pipeline model for a “wait-all” step.
src/Sdk/DTPipelines/Pipelines/CancelStep.cs Adds pipeline model for a “cancel” step.
src/Sdk/DTPipelines/Pipelines/StepConverter.cs Enables JSON deserialization for new step types.
src/Sdk/DTPipelines/Pipelines/Step.cs Registers new known step types and extends the StepType enum.
src/Sdk/DTPipelines/Pipelines/ActionStep.cs Adds Background flag and ensures it’s cloned.
src/Runner.Worker/WaitStepRunner.cs Adds runner step type for “wait” control step.
src/Runner.Worker/WaitAllStepRunner.cs Adds runner step type for “wait-all” control step.
src/Runner.Worker/CancelStepRunner.cs Adds runner step type for “cancel” control step.
src/Runner.Worker/StepsRunner.cs Implements background execution, wait/wait-all/cancel handling, slot limiting, and GitHubContext isolation.
src/Runner.Worker/StepsContext.cs Adds locking around step output/outcome/conclusion mutations.
src/Runner.Worker/JobExtension.cs Wires new pipeline step types into job initialization and sets timeline variables for results metadata.
src/Runner.Worker/ExecutionContext.cs Adds timeline-record variable setter and maps those variables into StepResult; adjusts template evaluator feature gating.
src/Runner.Worker/BackgroundStepContext.cs Adds per-background-step tracking (task, CTS, result, external id).

Copilot's findings

Comments suppressed due to low confidence (4)

src/Sdk/WebApi/WebApi/ResultsHttpClient.cs:568

  • File.AppendAllText("/tmp/bg-steps-debug.log", ...) is executed without a try/catch here. If the path is unavailable (e.g., Windows runner, locked filesystem, permissions), it will throw and can break step updates. Remove this statement or guard it behind safe, platform-appropriate diagnostics.

            System.IO.File.AppendAllText("/tmp/bg-steps-debug.log", $"[BG-DEBUG]   Result: name={step.Name}, isBackground={step.IsBackground}, stepType={step.StepType}\n");
            return step;

src/Sdk/WebApi/WebApi/ResultsHttpClient.cs:644

  • Serializing and logging full StepsUpdateRequest JSON payloads to /tmp can expose tokens/PII (steps metadata can contain user-controlled values) and adds unbounded I/O in a hot path. Please remove this debug logging or route it through existing trace logging with appropriate redaction and opt-in controls.
                // DEBUG: Serialize and log the JSON payload
                try
                {
                    var json = Newtonsoft.Json.JsonConvert.SerializeObject(request, Newtonsoft.Json.Formatting.None);
                    System.IO.File.AppendAllText("/tmp/bg-steps-debug.log", $"[BG-DEBUG] JSON payload: {json}\n");
                }
                catch (Exception ex)
                {
                    System.IO.File.AppendAllText("/tmp/bg-steps-debug.log", $"[BG-DEBUG] Serialize error: {ex.Message}\n");
                }

src/Runner.Worker/StepsRunner.cs:605

  • In HandleWaitAsync, the local completed is assigned but never used. With TreatWarningsAsErrors enabled in this repo, this will fail the build. Remove the unused assignment (or use it if intended).
                Trace.Info($"Waiting for {tasks.Count} background step(s)...");
                var cancelTask = Task.Delay(Timeout.Infinite, cancellationToken);
                var completed = await Task.WhenAny(Task.WhenAll(tasks), cancelTask);
                if (cancellationToken.IsCancellationRequested)
                {

src/Runner.Worker/JobExtension.cs:494

  • This sets cancel_step_id on the cancel step timeline record to the logical step id, but later logic expects to publish the background step's external/timeline id. If StepsRunner can't resolve the id, this logical value will be reported upstream. Consider deferring this variable until you can resolve the external id (or always overwrite/clear it during execution).
                            cancelRunner.ExecutionContext.SetTimelineRecordVariable("step_type", "cancel");
                            if (!string.IsNullOrEmpty(cancelRunner.CancelStepId))
                            {
                                cancelRunner.ExecutionContext.SetTimelineRecordVariable("cancel_step_id", cancelRunner.CancelStepId);
                            }
  • Files reviewed: 18/18 changed files
  • Comments generated: 14

Comment on lines +517 to +527
// DEBUG: Log all variables on this timeline record to a file
try
{
var debugLine = $"[BG-DEBUG] ConvertTimelineRecordToStep: name={r.Name}, id={r.Id}, variableCount={r.Variables.Count}";
foreach (var kvp in r.Variables)
{
debugLine += $"\n Variable: {kvp.Key}={kvp.Value.Value}";
}
System.IO.File.AppendAllText("/tmp/bg-steps-debug.log", debugLine + "\n");
}
catch { }
Comment on lines +40 to 47
// Track active background steps
private readonly ConcurrentDictionary<string, BackgroundStepContext> _backgroundSteps = new();
private readonly HashSet<string> _waitedStepIds = new();
private readonly SemaphoreSlim _backgroundSlotSemaphore = new(10); // max 10 concurrent background steps

// StepsRunner should never throw exception to caller
public async Task RunAsync(IExecutionContext jobContext)
{
Comment on lines +658 to +671
// Wait for all to finish with a grace period
var gracePeriod = TimeSpan.FromSeconds(7.5);
var allTasks = activeSteps.Select(bg => bg.ExecutionTask).ToArray();
await Task.WhenAny(Task.WhenAll(allTasks), Task.Delay(gracePeriod));

var stillRunning = activeSteps.Where(bg => !bg.IsCompleted).ToList();
if (stillRunning.Count > 0)
{
Trace.Warning($"{stillRunning.Count} background step(s) did not terminate gracefully.");
}

// Final wait for all tasks to complete
await Task.WhenAll(allTasks);
}
Comment on lines +686 to +697
// Wait for grace period (7.5 seconds)
var gracePeriod = TimeSpan.FromSeconds(7.5);
await Task.WhenAny(bgCtx.ExecutionTask, Task.Delay(gracePeriod));

if (!bgCtx.IsCompleted)
{
Trace.Warning($"Background step '{cancelStep.CancelStepId}' did not terminate gracefully after {gracePeriod.TotalSeconds}s.");
}
}

await bgCtx.ExecutionTask;
Trace.Info($"Background step '{cancelStep.CancelStepId}' cancelled/completed.");
Comment on lines +548 to +560
if (externalIds.Count > 0)
{
executionContext.SetTimelineRecordVariable("wait_step_ids", string.Join(",", externalIds));
}
}

private void SetCancelStepIdTimelineVariable(IExecutionContext executionContext, string logicalStepId)
{
var externalId = GetBackgroundExternalId(logicalStepId);
if (!string.IsNullOrEmpty(externalId))
{
executionContext.SetTimelineRecordVariable("cancel_step_id", externalId);
}
Comment on lines +1 to +5
using System;
using System.ComponentModel;
using System.Runtime.Serialization;
using GitHub.DistributedTask.ObjectTemplating.Tokens;
using Newtonsoft.Json;
Comment on lines +1 to +5
using System;
using System.ComponentModel;
using System.Runtime.Serialization;
using GitHub.DistributedTask.ObjectTemplating.Tokens;
using Newtonsoft.Json;
@@ -0,0 +1,27 @@
using System;
using System.Collections.Concurrent;
Comment on lines +210 to +224
// Arrange: background step that runs until cancelled
var bgCts = new CancellationTokenSource();

var bgStep = CreateStep(hc, TaskResult.Succeeded, "success()", name: "server", contextName: "server");
bgStep.Setup(x => x.RunAsync()).Returns(async () =>
{
try
{
await Task.Delay(TimeSpan.FromSeconds(30), bgCts.Token);
}
catch (OperationCanceledException)
{
throw;
}
});
@@ -1440,7 +1463,6 @@ public static IPipelineTemplateEvaluator ToPipelineTemplateEvaluator(this IExecu
return new PipelineTemplateEvaluator(traceWriter, schema, context.Global.FileTable)
{
MaxErrorMessageLength = int.MaxValue, // Don't truncate error messages otherwise we might not scrub secrets correctly
public sealed class StepsRunner : RunnerService, IStepsRunner
{
// Track active background steps
private readonly ConcurrentDictionary<string, BackgroundStepContext> _backgroundSteps = new();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a ConcurrentDictionary?


// Implicit wait-all before post-job hooks:
// If any background steps haven't been waited on, inject a visible wait-all step.
if (_backgroundSteps.Count > 0)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this is something that should be done in "Set up job" instead.

That's where all the pre/post/etc stuff is registered today.

actionStep.ExecutionContext = jobContext.CreateChild(actionStep.Action.Id, actionStep.DisplayName, actionStep.Action.Name, null, actionStep.Action.ContextName, ActionRunStage.Main, intraActionState);

// Store background step metadata on the timeline record for results service
if (actionStep.Action?.Background == true)
Copy link
Copy Markdown
Collaborator

@ericsciple ericsciple May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether pre/post should not follow background

That is:

  • pre-stage would be sequential
  • main-stage could have background
  • implicit wait at the end of main-stage
  • post-stage is sequential

Otherwise might need implicit waits at the end of each stage, which might be fine too.

var externalIds = GetBackgroundExternalIds(logicalStepIds);
if (externalIds.Count > 0)
{
executionContext.SetTimelineRecordVariable("wait_step_ids", string.Join(",", externalIds));
Copy link
Copy Markdown
Collaborator

@ericsciple ericsciple May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per copilot:

The same key (wait_step_ids / cancel_step_id) is written twice with different value semantics:

  • Setup-time write: comma-separated logical ids (from waitStep.WaitStepIds).
  • Runtime overwrite: comma-separated external GUIDs (from BackgroundStepContext.ExternalId).

Both are queued to the results service. The consumer sees the logical-ids version first, then sees them replaced with GUIDs later. That's confusing and probably unintentional.


// Start
step.ExecutionContext.Start();
bool isBackground = false;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider:

var isBackground = (step as IActionRunner)?.Action?.Background || false;

var externalIds = GetBackgroundExternalIds(logicalStepIds);
if (externalIds.Count > 0)
{
executionContext.SetTimelineRecordVariable("wait_step_ids", string.Join(",", externalIds));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if all the timeline record variable stuff can be contained to "Set up job". Per copilot, all the external IDs are known during "set up job".

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also curious @TingluoHuang's thoughts on timeline record variable approach in general.

Probably makes sense, but not something I normally think deeply about.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether we should have first class properties instead of timeline record variables, which iiuc we don't otherwise use today.

public CancellationTokenSource Cts { get; set; }
public GitHub.DistributedTask.WebApi.TaskResult? Result { get; set; }
public bool IsCompleted => ExecutionTask?.IsCompleted ?? false;
public string ExternalId => Step.ExecutionContext == null || Step.ExecutionContext.Id == Guid.Empty ? null : Step.ExecutionContext.Id.ToString("N");
Copy link
Copy Markdown
Collaborator

@ericsciple ericsciple May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if Step.ExecutionContext.Id is ever null or empty?

iiuc external IDs are known during "Set up job"

var stepOrder = 0;
foreach (var step in message.Steps)
{
stepOrder++;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this used?

var outputs = step["outputs"].AssertDictionary("outputs");
outputs[outputName] = new StringContextData(value);
if (_propertyRegex.IsMatch(outputName))
lock (_lock)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking background step outputs shouldn't be available to foreground steps before the wait synchronization point

var stepUpdateEndpoint = new Uri(m_resultsServiceUrl, Constants.WorkflowStepsUpdate);
foreach (var request in stepUpdateRequests)
{
// DEBUG: Serialize and log the JSON payload
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like this should be deleted


try
{
System.IO.File.AppendAllText("/tmp/bg-steps-debug.log", $"[BG-DEBUG] UpdateWorkflowStepsAsync: {stepRecords.Count} task records\n");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be deleted?

{
debugLine += $"\n Variable: {kvp.Key}={kvp.Value.Value}";
}
System.IO.File.AppendAllText("/tmp/bg-steps-debug.log", debugLine + "\n");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete


[DataContract]
[JsonObject(NamingStrategyType = typeof(SnakeCaseNamingStrategy))]
public class WaitControlDto
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "Dto" ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants