Skip to content

Async benchmarks show regressions using 0.16.0 prereleases #3139

@martincostello

Description

@martincostello

I've been doing some benchmarking of some applications comparing .NET 10 with .NET 11, which has required me to benchmark using 0.16.0 as 0.15.8 does not support apps targeting net11.0. These benchmarks are more end-to-end benchmarks than microbenchmarks (e.g. a whole HTTP request) and also involve async.

The benchmarks are essentially the below. The meat of the benchmark is in a private method for reuse across different benchmarks with different setups:

[Benchmark]
public Task<int> SomeScenarioName() => SendRequestsAsync();

private async Task<int> SendRequestsAsync()
{
    int statusCode = 0;

    for (int i = 0; i < OperationsPerInvoke; i++)
    {
        using var response = await _client!.GetAsync(Endpoint, HttpCompletionOption.ResponseHeadersRead);
        response.EnsureSuccessStatusCode();
        statusCode += (int)response.StatusCode;
    }

    return statusCode;
}

Tracking performance between the two is seemingly showing a lot of regression for both duration and allocations between the two. For example:

.NET 10 + 0.15.8:

BenchmarkDotNet v0.15.8, Windows 11 (10.0.26200.8457/25H2/2025Update/HudsonValley2)
13th Gen Intel Core i7-13700H 2.90GHz, 1 CPU, 20 logical and 14 physical cores
.NET SDK 10.0.300
  [Host]     : .NET 10.0.8 (10.0.8, 10.0.826.23019), X64 RyuJIT x86-64-v3
  DefaultJob : .NET 10.0.8 (10.0.8, 10.0.826.23019), X64 RyuJIT x86-64-v3
Method Mean Error StdDev Ratio RatioSD Gen0 Allocated Alloc Ratio
Baseline 39.66 μs 0.394 μs 0.349 μs 1.00 0.01 0.1221 1.78 KB 1.00
Logs 36.74 μs 0.611 μs 0.572 μs 0.93 0.02 - 2.65 KB 1.49
AllTelemetry 93.32 μs 1.863 μs 4.011 μs 2.35 0.10 0.2441 3.33 KB 1.87
Metrics 86.56 μs 1.678 μs 4.178 μs 2.18 0.11 - 2.69 KB 1.51
Traces 83.80 μs 4.613 μs 13.011 μs 2.11 0.33 0.2441 3.29 KB 1.85

.NET 11 preview 4 + 0.16.0-*:

BenchmarkDotNet v0.16.0-nightly.20260513.530, Windows 11 (10.0.26200.8457/25H2/2025Update/HudsonValley2)
13th Gen Intel Core i7-13700H 2.90GHz, 1 CPU, 20 logical and 14 physical cores
Memory: 63.83 GB Total, 37.85 GB Available
.NET SDK 11.0.100-preview.4.26230.115
  [Host]     : .NET 11.0.0 (11.0.0-preview.4.26230.115, 11.0.26.23115), X64 RyuJIT x86-64-v3
  DefaultJob : .NET 11.0.0 (11.0.0-preview.4.26230.115, 11.0.26.23115), X64 RyuJIT x86-64-v3
Method Mean Error StdDev Median Ratio RatioSD Gen0 Allocated Alloc Ratio
Baseline 120.1 μs 2.67 μs 7.74 μs 123.1 μs 1.00 0.00 0.1831 2.74 KB 1.00
Logs 120.5 μs 4.23 μs 12.47 μs 125.4 μs 1.01 0.13 0.2441 3.62 KB 1.32
AllTelemetry 133.4 μs 4.34 μs 12.66 μs 137.3 μs 1.12 0.13 0.2441 4.3 KB 1.57
Metrics 129.2 μs 3.19 μs 9.24 μs 132.1 μs 1.08 0.11 0.2441 3.66 KB 1.34
Traces 128.7 μs 4.58 μs 13.13 μs 132.0 μs 1.08 0.13 0.2441 4.26 KB 1.55

I ran the benchmarks with the Event Pipe profiler and asked Copilot to analyse the difference between the two, and it returned the following analysis:

The regressions are most likely coming from async benchmark harness overhead in the .NET 11 runs, not from a new hot OpenTelemetry code path.

From the .speedscope.json traces, the common new hotspot in the worst regressions (AspNetCoreBenchmarks, LogBenchmarks, MetricBenchmarks, TraceBenchmarks) is:

BenchmarkDotNetSynchronizationContext.ExecuteUntilComplete → SynchronizationContextAwaitTaskContinuation.PostAction → System.Threading.Lock.Enter/TryEnterSlow and related continuation machinery such as AsyncHelpers.AllocContinuation / AsyncHelpers+RuntimeAsyncTask.DispatchContinuations.

That pattern is essentially absent in the corresponding .NET 10 traces, where time is dominated more by the normal HttpClient/Kestrel request path. It also matches the benchmark shape: all of these scenarios ultimately run the same async SendRequestsAsync() loop with 32 awaited HttpClient.GetAsync() calls per invocation (perf\Benchmarks\Benchmarks.cs:120-131), so extra continuation/synchronization overhead hits every request and shows up as both higher duration and higher allocation.

Two reasons this looks harness-related rather than an OpenTelemetry-specific regression:

  1. The biggest regressions appear even in Baseline runs, where no telemetry is enabled.
  2. The .NET 10 and .NET 11 artifact sets were not produced with the same BenchmarkDotNet host:
    • .NET 10 used BenchmarkDotNet v0.15.8 (BenchmarkRun-20260517-134640.log:25-30)
    • .NET 11 used BenchmarkDotNet v0.16.0-nightly.20260513.530 with a different launch mode (--ipcPort, --diagnoserRunMode 3) (BenchmarkRun-20260517-143602.log:25-31)

The lighter/neutral cases like AwsBenchmarks and SqlServerBenchmarks show far less of that lock/continuation overhead in their traces, which fits: their total request cost is dominated more by external work, so the harness overhead is amortized.

Bottom line: the likely source of the observed .NET 11 regressions is a change in how these async benchmarks are being driven and synchronized under the .NET 11 + BenchmarkDotNet 0.16 nightly combination, with extra continuation posting/locking/allocation in the benchmark host.

I would treat the current .NET 10 vs .NET 11 numbers as confounded by the benchmark harness version/execution model, not as clean evidence of a runtime or OpenTelemetry regression.

I then ran the benchmarks again for .NET 10 but using 0.16.0-* to rule out .NET 11 as the source, which gave these results:


BenchmarkDotNet v0.16.0-nightly.20260513.530, Windows 11 (10.0.26200.8457/25H2/2025Update/HudsonValley2)
13th Gen Intel Core i7-13700H 2.90GHz, 1 CPU, 20 logical and 14 physical cores
Memory: 63.83 GB Total, 33.13 GB Available
.NET SDK 10.0.300
  [Host]     : .NET 10.0.8 (10.0.8, 10.0.826.23019), X64 RyuJIT x86-64-v3
  DefaultJob : .NET 10.0.8 (10.0.8, 10.0.826.23019), X64 RyuJIT x86-64-v3


Method Mean Error StdDev Median Ratio RatioSD Gen0 Allocated Alloc Ratio
Baseline 85.47 μs 7.672 μs 22.622 μs 93.72 μs 1.00 0.00 0.1221 1.83 KB 1.00
Logs 82.52 μs 5.980 μs 16.770 μs 91.03 μs 1.06 0.44 0.1831 2.69 KB 1.47
AllTelemetry 102.01 μs 1.433 μs 1.340 μs 101.96 μs 1.31 0.46 0.2441 3.38 KB 1.85
Metrics 96.75 μs 1.811 μs 1.606 μs 96.90 μs 1.24 0.44 0.1831 2.74 KB 1.50
Traces 96.34 μs 1.881 μs 2.984 μs 96.66 μs 1.24 0.44 0.2441 3.34 KB 1.83

The regressions aren't as bad, but still present.

Asking Copilot to compare the profiler output gave the following:

Part of the regression is already present in the new .NET 10 + BenchmarkDotNet 0.16 async run — but not the full .NET 11 effect.

The new run is enough to show that BenchmarkDotNet 0.16’s async execution model changes the measurements on .NET 10 too:

  • It uses the same new harness shape as the .NET 11 run: BenchmarkDotNet v0.16.0-nightly.20260513.530 with --ipcPort and --diagnoserRunMode 3.
  • Several benchmarks regress without changing the runtime:
    • AspNetCoreBenchmarks.Baseline: 2.16x
    • AspNetCoreBenchmarks.Logs: 2.25x
    • LogBenchmarks.Baseline: 1.87x
    • LogBenchmarks.Logs: 2.00x
  • That means a meaningful chunk of the original “.NET 11 regression” was really harness-induced.

But it is not the same regression in full as the .NET 11 traces:

┌────────────────────┬──────────────┬────────────┬─────────┬───────────────┬─────────┐
│ Run │ Sync context │ PostAction │ Lock │ Async helpers │ Http │
├────────────────────┼──────────────┼────────────┼─────────┼───────────────┼─────────┤
│ original .NET 10 │ 0 │ 0 │ 0 │ 0 │ 69,123 │
├────────────────────┼──────────────┼────────────┼─────────┼───────────────┼─────────┤
│ .NET 10 + BDN 0.16 │ 161,313 │ 5,246 │ 0 │ 0 │ 295,993 │
├────────────────────┼──────────────┼────────────┼─────────┼───────────────┼─────────┤
│ .NET 11 + BDN 0.16 │ 431,052 │ 3,952 │ 181,397 │ 32,802 │ 91,169 │
└────────────────────┴──────────────┴────────────┴─────────┴───────────────┴─────────┘

So the conclusion is:

  1. Yes, the BenchmarkDotNet async machinery regression is present on .NET 10. The new .NET 10 traces now spend substantial time in BenchmarkSynchronizationContext.ExecuteUntilComplete / BenchmarkDotNetSynchronizationContext.ExecuteUntilComplete, which was absent in the original .NET 10 run.
  2. No, the .NET 10 rerun does not reproduce the full .NET 11 pathology. The big .NET 11-specific hotspot — System.Threading.Lock.Enter/TryEnterSlow plus extra async continuation machinery — does not appear in the .NET 10 + 0.16 traces.
  3. That means the original .NET 11 regressions were a combination of two effects:
    • Harness effect from BenchmarkDotNet 0.16 async execution, visible now on .NET 10 too.
    • Additional .NET 11-specific amplification, visible as lock contention and continuation overhead in the .NET 11 traces.

Overall, the new .NET 10 run confirms that BenchmarkDotNet async machinery was already contaminating the comparison, but it also shows that .NET 11 still has an extra regression beyond that harness change.

Are there maybe some issues after the changes in #2958, or are these changes expected? They seem to be quite large changes for an expected change, particularly if trying to track performance for an application over time (and between versions).

For the .NET 11 results, I also wonder if runtime async (runtime, aspnetcore) is having an effect on how things are measured somehow (which is new in .NET 11 preview 4)?

I also created a much smaller benchmark that has less moving parts, and that shows a similar regression just for .NET 10 with 0.15.8 and 0.16.0 against each other:

0.15.8:

BenchmarkDotNet v0.15.8, Windows 11 (10.0.26200.8457/25H2/2025Update/HudsonValley2)
13th Gen Intel Core i7-13700H 2.90GHz, 1 CPU, 20 logical and 14 physical cores
.NET SDK 10.0.300
  [Host]     : .NET 10.0.8 (10.0.8, 10.0.826.23019), X64 RyuJIT x86-64-v3
  DefaultJob : .NET 10.0.8 (10.0.8, 10.0.826.23019), X64 RyuJIT x86-64-v3
Method Mean Error StdDev Gen0 Allocated
HttpGetAsync 60.95 μs 1.080 μs 0.902 μs 0.3662 5.79 KB

0.16.0:

BenchmarkDotNet v0.16.0-nightly.20260516.537, Windows 11 (10.0.26200.8457/25H2/2025Update/HudsonValley2)
13th Gen Intel Core i7-13700H 2.90GHz, 1 CPU, 20 logical and 14 physical cores
Memory: 63.83 GB Total, 35.74 GB Available
.NET SDK 10.0.300
  [Host]     : .NET 10.0.8 (10.0.8, 10.0.826.23019), X64 RyuJIT x86-64-v3
  DefaultJob : .NET 10.0.8 (10.0.8, 10.0.826.23019), X64 RyuJIT x86-64-v3
Method Mean Error StdDev Gen0 Allocated
HttpGetAsync 68.30 μs 1.352 μs 2.731 μs 0.3662 5.88 KB

Copilot again points towards the async machinery:

Likely source of the regression: the slowdown does not appear to come from HttpClient/HttpListener itself. The traces point to extra async runner overhead in
BenchmarkDotNet 0.16.0, especially around synchronization-context and continuation handling for async benchmarks.

What stands out from the traces:

  1. BenchmarkDotNet 0.16.0 introduces a new async execution path:
    • BenchmarkDotNet.Engines.BenchmarkDotNetSynchronizationContext.ExecuteUntilComplete(...)
    • This frame is present in the 0.16.0 trace and not in the 0.15.8 baseline.
  2. After normalizing per benchmark operation, the extra cost shows up in async scheduling / execution-context work, not in the HTTP stack:
    • System.Threading.ExecutionContext.RunInternal(...): 112.5 us/op -> 138.5 us/op (+26.0 us/op)
    • System.Threading.Tasks.AwaitTaskContinuation.RunCallback(...): 0 -> 28.3 us/op
    • HttpBenchmarks+d__7.MoveNext(): 31.7 us/op -> 50.3 us/op (+18.6 us/op)
  3. The actual HTTP request/response pipeline is slightly lower in 0.16.0:
    • HttpClient+d__46.MoveNext(): 34.1 us/op -> 29.7 us/op
    • RedirectHandler+d__4.MoveNext(): 31.6 us/op -> 27.6 us/op
    • HttpConnection+d__56.MoveNext(): 30.2 us/op -> 26.3 us/op
    • Server-side HttpListener completion paths are also a bit lower.

One caveat: the raw trace totals are much larger in 0.16.0 because that run executed more measured iterations (WorkloadActual 50 times vs 16 times in 0.15.8), so total trace time is not directly comparable. After normalizing per operation, the signal is still the same.

Conclusion: the observed regression is most likely due to BenchmarkDotNet 0.16.0’s new async benchmark execution/pumping path adding continuation and ExecutionContext overhead, rather than a regression in the underlying networking code.

Benchmarks.csproj

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>disable</Nullable>
    <OutputType>Exe</OutputType>
    <TargetFramework>net10.0</TargetFramework>
  </PropertyGroup>
  <ItemGroup>
    <!--
    <PackageReference Include="BenchmarkDotNet" Version="0.15.8" />
    -->
    <PackageReference Include="BenchmarkDotNet" Version="0.16.0-nightly.20260516.537" />
  </ItemGroup>
</Project>

NuGet.config

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <packageSources>
    <clear />
    <add key="BenchmarkDotNet" value="https://www.myget.org/F/benchmarkdotnet/api/v3/index.json" />
    <add key="NuGet" value="https://api.nuget.org/v3/index.json" />
  </packageSources>
</configuration>

Program.cs

using System.Net;
using System.Net.Sockets;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkRunner.Run<HttpBenchmarks>(args: args);

public class HttpBenchmarks
{
    private static readonly byte[] ResponseBody = "Hello, world!"u8.ToArray();

    private HttpListener _listener = null!;
    private HttpClient _client = null!;
    private CancellationTokenSource _cancellation = null!;
    private Task _serverTask = Task.CompletedTask;

    [GlobalSetup]
    public void Setup()
    {
        var prefix = $"http://127.0.0.1:{GetFreePort()}/";

        _listener = new HttpListener();
        _listener.Prefixes.Add(prefix);
        _listener.Start();

        _cancellation = new CancellationTokenSource();
        _serverTask = Task.Run(() => RunServerAsync(_cancellation.Token));

        _client = new HttpClient()
        {
            BaseAddress = new Uri(prefix),
        };
    }

    [GlobalCleanup]
    public async Task CleanupAsync()
    {
        _cancellation.Cancel();
        _listener.Close();

        try
        {
            await _serverTask;
        }
        catch (Exception)
        {
        }

        _client.Dispose();
        _cancellation.Dispose();
    }

    [Benchmark]
    public async Task<int> HttpGetAsync()
    {
        var response = await _client.GetByteArrayAsync(string.Empty);
        return response.Length;
    }

    private async Task RunServerAsync(CancellationToken cancellationToken)
    {
        while (!cancellationToken.IsCancellationRequested)
        {
            HttpListenerContext context;

            try
            {
                context = await _listener.GetContextAsync().WaitAsync(cancellationToken);
            }
            catch (OperationCanceledException)
            {
                break;
            }
            catch (HttpListenerException) when (cancellationToken.IsCancellationRequested)
            {
                break;
            }
            catch (ObjectDisposedException) when (cancellationToken.IsCancellationRequested)
            {
                break;
            }

            var response = context.Response;
            response.StatusCode = (int)HttpStatusCode.OK;
            response.ContentLength64 = ResponseBody.Length;
            await response.OutputStream.WriteAsync(ResponseBody, cancellationToken);
            response.Close();
        }
    }

    private static int GetFreePort()
    {
        using var listener = new TcpListener(IPAddress.Loopback, 0);
        listener.Start();
        return ((IPEndPoint)listener.LocalEndpoint).Port;
    }
}

global.json

{
  "sdk": {
    "version": "10.0.300",
    "allowPrerelease": false
  }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions