Async benchmarks show regressions using 0.16.0 prereleases

I've been doing some benchmarking of some applications comparing .NET 10 with .NET 11, which has required me to benchmark using 0.16.0 as 0.15.8 does not support apps targeting `net11.0`. These benchmarks are more end-to-end benchmarks than microbenchmarks (e.g. a whole HTTP request) and also involve async.

The benchmarks are essentially the below. The meat of the benchmark is in a private method for reuse across different benchmarks with different setups:

```csharp
[Benchmark]
public Task<int> SomeScenarioName() => SendRequestsAsync();

private async Task<int> SendRequestsAsync()
{
    int statusCode = 0;

    for (int i = 0; i < OperationsPerInvoke; i++)
    {
        using var response = await _client!.GetAsync(Endpoint, HttpCompletionOption.ResponseHeadersRead);
        response.EnsureSuccessStatusCode();
        statusCode += (int)response.StatusCode;
    }

    return statusCode;
}
```

Tracking performance between the two is seemingly showing a lot of regression for both duration and allocations between the two. For example:

.NET 10 + 0.15.8:

```text
BenchmarkDotNet v0.15.8, Windows 11 (10.0.26200.8457/25H2/2025Update/HudsonValley2)
13th Gen Intel Core i7-13700H 2.90GHz, 1 CPU, 20 logical and 14 physical cores
.NET SDK 10.0.300
  [Host]     : .NET 10.0.8 (10.0.8, 10.0.826.23019), X64 RyuJIT x86-64-v3
  DefaultJob : .NET 10.0.8 (10.0.8, 10.0.826.23019), X64 RyuJIT x86-64-v3
```

| Method       | Mean     | Error    | StdDev    | Ratio | RatioSD | Gen0   | Allocated | Alloc Ratio |
|------------- |---------:|---------:|----------:|------:|--------:|-------:|----------:|------------:|
| Baseline     | 39.66 μs | 0.394 μs |  0.349 μs |  1.00 |    0.01 | 0.1221 |   1.78 KB |        1.00 |
| Logs         | 36.74 μs | 0.611 μs |  0.572 μs |  0.93 |    0.02 |      - |   2.65 KB |        1.49 |
| AllTelemetry | 93.32 μs | 1.863 μs |  4.011 μs |  2.35 |    0.10 | 0.2441 |   3.33 KB |        1.87 |
| Metrics      | 86.56 μs | 1.678 μs |  4.178 μs |  2.18 |    0.11 |      - |   2.69 KB |        1.51 |
| Traces       | 83.80 μs | 4.613 μs | 13.011 μs |  2.11 |    0.33 | 0.2441 |   3.29 KB |        1.85 |

.NET 11 preview 4 + 0.16.0-*:

```text
BenchmarkDotNet v0.16.0-nightly.20260513.530, Windows 11 (10.0.26200.8457/25H2/2025Update/HudsonValley2)
13th Gen Intel Core i7-13700H 2.90GHz, 1 CPU, 20 logical and 14 physical cores
Memory: 63.83 GB Total, 37.85 GB Available
.NET SDK 11.0.100-preview.4.26230.115
  [Host]     : .NET 11.0.0 (11.0.0-preview.4.26230.115, 11.0.26.23115), X64 RyuJIT x86-64-v3
  DefaultJob : .NET 11.0.0 (11.0.0-preview.4.26230.115, 11.0.26.23115), X64 RyuJIT x86-64-v3
```

| Method       | Mean     | Error   | StdDev   | Median   | Ratio | RatioSD | Gen0   | Allocated | Alloc Ratio |
|------------- |---------:|--------:|---------:|---------:|------:|--------:|-------:|----------:|------------:|
| Baseline     | 120.1 μs | 2.67 μs |  7.74 μs | 123.1 μs |  1.00 |    0.00 | 0.1831 |   2.74 KB |        1.00 |
| Logs         | 120.5 μs | 4.23 μs | 12.47 μs | 125.4 μs |  1.01 |    0.13 | 0.2441 |   3.62 KB |        1.32 |
| AllTelemetry | 133.4 μs | 4.34 μs | 12.66 μs | 137.3 μs |  1.12 |    0.13 | 0.2441 |    4.3 KB |        1.57 |
| Metrics      | 129.2 μs | 3.19 μs |  9.24 μs | 132.1 μs |  1.08 |    0.11 | 0.2441 |   3.66 KB |        1.34 |
| Traces       | 128.7 μs | 4.58 μs | 13.13 μs | 132.0 μs |  1.08 |    0.13 | 0.2441 |   4.26 KB |        1.55 |

I ran the benchmarks with the Event Pipe profiler and asked Copilot to analyse the difference between the two, and it returned the following analysis:

> The regressions are most likely coming from async benchmark harness overhead in the .NET 11 runs, not from a new hot OpenTelemetry code path.
> 
> From the .speedscope.json traces, the common new hotspot in the worst regressions (AspNetCoreBenchmarks, LogBenchmarks, MetricBenchmarks, TraceBenchmarks) is:
> 
> BenchmarkDotNetSynchronizationContext.ExecuteUntilComplete → SynchronizationContextAwaitTaskContinuation.PostAction → System.Threading.Lock.Enter/TryEnterSlow and related continuation machinery such as AsyncHelpers.AllocContinuation / AsyncHelpers+RuntimeAsyncTask.DispatchContinuations.
> 
> That pattern is essentially absent in the corresponding .NET 10 traces, where time is dominated more by the normal HttpClient/Kestrel request path. It also matches the benchmark shape: all of these scenarios ultimately run the same async SendRequestsAsync() loop with 32 awaited HttpClient.GetAsync() calls per invocation (perf\Benchmarks\Benchmarks.cs:120-131), so extra continuation/synchronization overhead hits every request and shows up as both higher duration and higher allocation.
> 
> Two reasons this looks harness-related rather than an OpenTelemetry-specific regression:
> 
>  1. The biggest regressions appear even in Baseline runs, where no telemetry is enabled.
>  2. The .NET 10 and .NET 11 artifact sets were not produced with the same BenchmarkDotNet host:
>     - .NET 10 used BenchmarkDotNet v0.15.8 (BenchmarkRun-20260517-134640.log:25-30)
>     - .NET 11 used BenchmarkDotNet v0.16.0-nightly.20260513.530 with a different launch mode (--ipcPort, --diagnoserRunMode 3) (BenchmarkRun-20260517-143602.log:25-31)
> 
> The lighter/neutral cases like AwsBenchmarks and SqlServerBenchmarks show far less of that lock/continuation overhead in their traces, which fits: their total request cost is dominated more by external work, so the harness overhead is amortized.
> 
> Bottom line: the likely source of the observed .NET 11 regressions is a change in how these async benchmarks are being driven and synchronized under the .NET 11 + BenchmarkDotNet 0.16 nightly combination, with extra continuation posting/locking/allocation in the benchmark host.
>
> I would treat the current .NET 10 vs .NET 11 numbers as confounded by the benchmark harness version/execution model, not as clean evidence of a runtime or OpenTelemetry regression.

I then ran the benchmarks again for .NET 10 but using 0.16.0-* to rule out .NET 11 as the source, which gave these results:

```

BenchmarkDotNet v0.16.0-nightly.20260513.530, Windows 11 (10.0.26200.8457/25H2/2025Update/HudsonValley2)
13th Gen Intel Core i7-13700H 2.90GHz, 1 CPU, 20 logical and 14 physical cores
Memory: 63.83 GB Total, 33.13 GB Available
.NET SDK 10.0.300
  [Host]     : .NET 10.0.8 (10.0.8, 10.0.826.23019), X64 RyuJIT x86-64-v3
  DefaultJob : .NET 10.0.8 (10.0.8, 10.0.826.23019), X64 RyuJIT x86-64-v3


```
| Method       | Mean      | Error    | StdDev    | Median    | Ratio | RatioSD | Gen0   | Allocated | Alloc Ratio |
|------------- |----------:|---------:|----------:|----------:|------:|--------:|-------:|----------:|------------:|
| Baseline     |  85.47 μs | 7.672 μs | 22.622 μs |  93.72 μs |  1.00 |    0.00 | 0.1221 |   1.83 KB |        1.00 |
| Logs         |  82.52 μs | 5.980 μs | 16.770 μs |  91.03 μs |  1.06 |    0.44 | 0.1831 |   2.69 KB |        1.47 |
| AllTelemetry | 102.01 μs | 1.433 μs |  1.340 μs | 101.96 μs |  1.31 |    0.46 | 0.2441 |   3.38 KB |        1.85 |
| Metrics      |  96.75 μs | 1.811 μs |  1.606 μs |  96.90 μs |  1.24 |    0.44 | 0.1831 |   2.74 KB |        1.50 |
| Traces       |  96.34 μs | 1.881 μs |  2.984 μs |  96.66 μs |  1.24 |    0.44 | 0.2441 |   3.34 KB |        1.83 |

The regressions aren't as bad, but still present.

Asking Copilot to compare the profiler output gave the following:

> Part of the regression is already present in the new .NET 10 + BenchmarkDotNet 0.16 async run — but not the full .NET 11 effect.
> 
>   The new run is enough to show that BenchmarkDotNet 0.16’s async execution model changes the measurements on .NET 10 too:
> 
>    - It uses the same new harness shape as the .NET 11 run: BenchmarkDotNet v0.16.0-nightly.20260513.530 with --ipcPort and --diagnoserRunMode 3.
>    - Several benchmarks regress without changing the runtime:
>      - AspNetCoreBenchmarks.Baseline: 2.16x
>      - AspNetCoreBenchmarks.Logs: 2.25x
>      - LogBenchmarks.Baseline: 1.87x
>      - LogBenchmarks.Logs: 2.00x
>    - That means a meaningful chunk of the original “.NET 11 regression” was really harness-induced.
> 
>   But it is not the same regression in full as the .NET 11 traces:
> 
>   ┌────────────────────┬──────────────┬────────────┬─────────┬───────────────┬─────────┐
>   │ Run                │ Sync context │ PostAction │ Lock    │ Async helpers │ Http    │
>   ├────────────────────┼──────────────┼────────────┼─────────┼───────────────┼─────────┤
>   │ original .NET 10   │ 0            │ 0          │ 0       │ 0             │ 69,123  │
>   ├────────────────────┼──────────────┼────────────┼─────────┼───────────────┼─────────┤
>   │ .NET 10 + BDN 0.16 │ 161,313      │ 5,246      │ 0       │ 0             │ 295,993 │
>   ├────────────────────┼──────────────┼────────────┼─────────┼───────────────┼─────────┤
>   │ .NET 11 + BDN 0.16 │ 431,052      │ 3,952      │ 181,397 │ 32,802        │ 91,169  │
>   └────────────────────┴──────────────┴────────────┴─────────┴───────────────┴─────────┘
> 
>   So the conclusion is:
> 
>    1. Yes, the BenchmarkDotNet async machinery regression is present on .NET 10. The new .NET 10 traces now spend substantial time in BenchmarkSynchronizationContext.ExecuteUntilComplete / BenchmarkDotNetSynchronizationContext.ExecuteUntilComplete, which was absent in the original .NET 10 run.
>    2. No, the .NET 10 rerun does not reproduce the full .NET 11 pathology. The big .NET 11-specific hotspot — System.Threading.Lock.Enter/TryEnterSlow plus extra async continuation machinery — does not appear in the .NET 10 + 0.16 traces.
>    3. That means the original .NET 11 regressions were a combination of two effects:
>       - Harness effect from BenchmarkDotNet  0.16 async execution, visible now on .NET 10 too.
>       - Additional .NET 11-specific amplification, visible as lock contention and continuation overhead in the .NET 11 traces.
> 
>   Overall, the new .NET 10 run confirms that BenchmarkDotNet async machinery was already contaminating the comparison, but it also shows that .NET 11 still has an extra regression beyond that harness change.

Are there maybe some issues after the changes in #2958, or are these changes expected? They seem to be quite large changes for an expected change, particularly if trying to track performance for an application over time (and between versions).

For the .NET 11 results, I also wonder if runtime async ([runtime](https://github.com/dotnet/core/blob/main/release-notes/11.0/preview/preview4/runtime.md#runtime-libraries-are-now-compiled-with-runtime-async), [aspnetcore](https://github.com/dotnet/core/blob/main/release-notes/11.0/preview/preview4/aspnetcore.md#runtime-async-enabled-for-shared-framework-libraries)) is having an effect on how things are measured somehow (which is new in .NET 11 preview 4)?

I also created a much smaller benchmark that has less moving parts, and that shows a similar regression just for .NET 10 with 0.15.8 and 0.16.0 against each other:

0.15.8:

```text
BenchmarkDotNet v0.15.8, Windows 11 (10.0.26200.8457/25H2/2025Update/HudsonValley2)
13th Gen Intel Core i7-13700H 2.90GHz, 1 CPU, 20 logical and 14 physical cores
.NET SDK 10.0.300
  [Host]     : .NET 10.0.8 (10.0.8, 10.0.826.23019), X64 RyuJIT x86-64-v3
  DefaultJob : .NET 10.0.8 (10.0.8, 10.0.826.23019), X64 RyuJIT x86-64-v3
```

| Method       | Mean     | Error    | StdDev   | Gen0   | Allocated |
|------------- |---------:|---------:|---------:|-------:|----------:|
| HttpGetAsync | 60.95 μs | 1.080 μs | 0.902 μs | 0.3662 |   5.79 KB |

0.16.0:

```text
BenchmarkDotNet v0.16.0-nightly.20260516.537, Windows 11 (10.0.26200.8457/25H2/2025Update/HudsonValley2)
13th Gen Intel Core i7-13700H 2.90GHz, 1 CPU, 20 logical and 14 physical cores
Memory: 63.83 GB Total, 35.74 GB Available
.NET SDK 10.0.300
  [Host]     : .NET 10.0.8 (10.0.8, 10.0.826.23019), X64 RyuJIT x86-64-v3
  DefaultJob : .NET 10.0.8 (10.0.8, 10.0.826.23019), X64 RyuJIT x86-64-v3
```

| Method       | Mean     | Error    | StdDev   | Gen0   | Allocated |
|------------- |---------:|---------:|---------:|-------:|----------:|
| HttpGetAsync | 68.30 μs | 1.352 μs | 2.731 μs | 0.3662 |   5.88 KB |

Copilot again points towards the async machinery:

> Likely source of the regression: the slowdown does not appear to come from HttpClient/HttpListener itself. The traces point to extra async runner overhead in
> BenchmarkDotNet 0.16.0, especially around synchronization-context and continuation handling for async benchmarks.
> 
> What stands out from the traces:
> 
>  1. BenchmarkDotNet 0.16.0 introduces a new async execution path:
>     - BenchmarkDotNet.Engines.BenchmarkDotNetSynchronizationContext.ExecuteUntilComplete(...)
>     - This frame is present in the 0.16.0 trace and not in the 0.15.8 baseline.
>  2. After normalizing per benchmark operation, the extra cost shows up in async scheduling / execution-context work, not in the HTTP stack:
>     - System.Threading.ExecutionContext.RunInternal(...): 112.5 us/op -> 138.5 us/op (+26.0 us/op)
>     - System.Threading.Tasks.AwaitTaskContinuation.RunCallback(...): 0 -> 28.3 us/op
>     - HttpBenchmarks+<HttpGetAsync>d__7.MoveNext(): 31.7 us/op -> 50.3 us/op (+18.6 us/op)
>  3. The actual HTTP request/response pipeline is slightly lower in 0.16.0:
>      - HttpClient+<GetByteArrayAsyncCore>d__46.MoveNext(): 34.1 us/op -> 29.7 us/op
>      - RedirectHandler+<SendAsync>d__4.MoveNext(): 31.6 us/op -> 27.6 us/op
>      - HttpConnection+<SendAsync>d__56.MoveNext(): 30.2 us/op -> 26.3 us/op
>      - Server-side HttpListener completion paths are also a bit lower.
> 
> One caveat: the raw trace totals are much larger in 0.16.0 because that run executed more measured iterations (WorkloadActual 50 times vs 16 times in 0.15.8), so total trace time is not directly comparable. After normalizing per operation, the signal is still the same.
> 
> Conclusion: the observed regression is most likely due to BenchmarkDotNet 0.16.0’s new async benchmark execution/pumping path adding continuation and ExecutionContext overhead, rather than a regression in the underlying networking code.

### Benchmarks.csproj

```xml
<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>disable</Nullable>
    <OutputType>Exe</OutputType>
    <TargetFramework>net10.0</TargetFramework>
  </PropertyGroup>
  <ItemGroup>
    
    <PackageReference Include="BenchmarkDotNet" Version="0.16.0-nightly.20260516.537" />
  </ItemGroup>
</Project>
```

### NuGet.config

```xml
<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <packageSources>
    <clear />
    <add key="BenchmarkDotNet" value="https://www.myget.org/F/benchmarkdotnet/api/v3/index.json" />
    <add key="NuGet" value="https://api.nuget.org/v3/index.json" />
  </packageSources>
</configuration>
```

### Program.cs

```csharp
using System.Net;
using System.Net.Sockets;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkRunner.Run<HttpBenchmarks>(args: args);

public class HttpBenchmarks
{
    private static readonly byte[] ResponseBody = "Hello, world!"u8.ToArray();

    private HttpListener _listener = null!;
    private HttpClient _client = null!;
    private CancellationTokenSource _cancellation = null!;
    private Task _serverTask = Task.CompletedTask;

    [GlobalSetup]
    public void Setup()
    {
        var prefix = $"http://127.0.0.1:{GetFreePort()}/";

        _listener = new HttpListener();
        _listener.Prefixes.Add(prefix);
        _listener.Start();

        _cancellation = new CancellationTokenSource();
        _serverTask = Task.Run(() => RunServerAsync(_cancellation.Token));

        _client = new HttpClient()
        {
            BaseAddress = new Uri(prefix),
        };
    }

    [GlobalCleanup]
    public async Task CleanupAsync()
    {
        _cancellation.Cancel();
        _listener.Close();

        try
        {
            await _serverTask;
        }
        catch (Exception)
        {
        }

        _client.Dispose();
        _cancellation.Dispose();
    }

    [Benchmark]
    public async Task<int> HttpGetAsync()
    {
        var response = await _client.GetByteArrayAsync(string.Empty);
        return response.Length;
    }

    private async Task RunServerAsync(CancellationToken cancellationToken)
    {
        while (!cancellationToken.IsCancellationRequested)
        {
            HttpListenerContext context;

            try
            {
                context = await _listener.GetContextAsync().WaitAsync(cancellationToken);
            }
            catch (OperationCanceledException)
            {
                break;
            }
            catch (HttpListenerException) when (cancellationToken.IsCancellationRequested)
            {
                break;
            }
            catch (ObjectDisposedException) when (cancellationToken.IsCancellationRequested)
            {
                break;
            }

            var response = context.Response;
            response.StatusCode = (int)HttpStatusCode.OK;
            response.ContentLength64 = ResponseBody.Length;
            await response.OutputStream.WriteAsync(ResponseBody, cancellationToken);
            response.Close();
        }
    }

    private static int GetFreePort()
    {
        using var listener = new TcpListener(IPAddress.Loopback, 0);
        listener.Start();
        return ((IPEndPoint)listener.LocalEndpoint).Port;
    }
}
```

### global.json

```json
{
  "sdk": {
    "version": "10.0.300",
    "allowPrerelease": false
  }
}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Async benchmarks show regressions using 0.16.0 prereleases #3139

Benchmarks.csproj

NuGet.config

Program.cs

global.json

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Method	Mean	Error	StdDev	Ratio	RatioSD	Gen0	Allocated	Alloc Ratio
Baseline	39.66 μs	0.394 μs	0.349 μs	1.00	0.01	0.1221	1.78 KB	1.00
Logs	36.74 μs	0.611 μs	0.572 μs	0.93	0.02	-	2.65 KB	1.49
AllTelemetry	93.32 μs	1.863 μs	4.011 μs	2.35	0.10	0.2441	3.33 KB	1.87
Metrics	86.56 μs	1.678 μs	4.178 μs	2.18	0.11	-	2.69 KB	1.51
Traces	83.80 μs	4.613 μs	13.011 μs	2.11	0.33	0.2441	3.29 KB	1.85

Method	Mean	Error	StdDev	Median	Ratio	RatioSD	Gen0	Allocated	Alloc Ratio
Baseline	120.1 μs	2.67 μs	7.74 μs	123.1 μs	1.00	0.00	0.1831	2.74 KB	1.00
Logs	120.5 μs	4.23 μs	12.47 μs	125.4 μs	1.01	0.13	0.2441	3.62 KB	1.32
AllTelemetry	133.4 μs	4.34 μs	12.66 μs	137.3 μs	1.12	0.13	0.2441	4.3 KB	1.57
Metrics	129.2 μs	3.19 μs	9.24 μs	132.1 μs	1.08	0.11	0.2441	3.66 KB	1.34
Traces	128.7 μs	4.58 μs	13.13 μs	132.0 μs	1.08	0.13	0.2441	4.26 KB	1.55

Method	Mean	Error	StdDev	Median	Ratio	RatioSD	Gen0	Allocated	Alloc Ratio
Baseline	85.47 μs	7.672 μs	22.622 μs	93.72 μs	1.00	0.00	0.1221	1.83 KB	1.00
Logs	82.52 μs	5.980 μs	16.770 μs	91.03 μs	1.06	0.44	0.1831	2.69 KB	1.47
AllTelemetry	102.01 μs	1.433 μs	1.340 μs	101.96 μs	1.31	0.46	0.2441	3.38 KB	1.85
Metrics	96.75 μs	1.811 μs	1.606 μs	96.90 μs	1.24	0.44	0.1831	2.74 KB	1.50
Traces	96.34 μs	1.881 μs	2.984 μs	96.66 μs	1.24	0.44	0.2441	3.34 KB	1.83

Uh oh!

Async benchmarks show regressions using 0.16.0 prereleases #3139

Description

Benchmarks.csproj

NuGet.config

Program.cs

global.json

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions