Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 104 additions & 0 deletions docs/Rules/EPC38.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# EPC38 - Do not re-enumerate `IEnumerable<Task>`

Detects when a deferred `IEnumerable<Task>` / `IEnumerable<Task<T>>` is observed (enumerated) more than once
inside the same method body.

## Description

The standard way to "fan out" async work in .NET is to build a sequence of tasks from some input and then
wait for all of them:

```csharp
var tasks = ids.Select(async id => await DoWorkAsync(id));
await Task.WhenAll(tasks);
```

This *looks* like it produces a stable collection, but `Select` returns a **deferred** `IEnumerable<Task<T>>`:
every enumeration restarts the projection and creates a fresh set of tasks. So this seemingly innocent code:

```csharp
var tasks = ids.Select(async id => await DoWorkAsync(id));
await Task.WhenAll(tasks);
foreach (var t in tasks) // ❌ EPC38
{
Process(await t);
}
```

…does **not** process the results of the tasks we awaited. It starts a brand-new set of `DoWorkAsync` calls,
awaits each one again, and processes those. The work is duplicated, side effects fire twice, and timing-
sensitive behavior breaks in confusing ways.

Materializing the sequence up front fixes the bug:

```csharp
var tasks = ids.Select(async id => await DoWorkAsync(id)).ToArray();
await Task.WhenAll(tasks);
foreach (var t in tasks) // ✅ tasks is Task<T>[] — safe to re-enumerate
{
Process(await t);
}
```

The analyzer flags **any** symbol whose static type is `IEnumerable<Task>`, `IEnumerable<Task<T>>`,
`IEnumerable<ValueTask>`, or `IEnumerable<ValueTask<T>>` that is enumerated more than once inside the same
method body. Both observations matter — the rule does not require the first one to be `Task.WhenAll`.

## Code that triggers the analyzer

```csharp
async Task FooAsync(IEnumerable<int> ids)
{
var tasks = ids.Select(async id => { await Task.Delay(1); return id; });
await Task.WhenAll(tasks);
foreach (var t in tasks) { await t; } // ❌ EPC38
}

void Counts(IEnumerable<Task<int>> tasks)
{
var count = tasks.Count();
var sum = tasks.Sum(t => t.Result); // ❌ EPC38
}

async Task ManualAsync(IEnumerable<Task> tasks)
{
foreach (var t in tasks) { await t; }
foreach (var t in tasks) { _ = t.Status; } // ❌ EPC38 (strict superset case)
}
```

## How to fix

Materialize the sequence once and use the concrete collection from then on:

```csharp
async Task FooAsync(IEnumerable<int> ids)
{
var tasks = ids.Select(async id => { await Task.Delay(1); return id; }).ToArray();
await Task.WhenAll(tasks);
foreach (var t in tasks) { await t; } // ✅ Task<int>[] — safe
}
```

`ToList()`, `ToArray()`, and similar materializing operators all work. If the source is a parameter and you
control the API, prefer to take `IReadOnlyList<Task<T>>` or `Task<T>[]` instead of `IEnumerable<Task<T>>`,
which both communicates intent and makes the rule fall silent.

## When NOT to suppress

In almost every case this is a real bug. Suppressing it usually means you've convinced yourself the producer
is *not* deferred — in which case the right fix is to change the static type of the variable to the
materialized type (`IReadOnlyList<Task<T>>`, `Task<T>[]`, etc.) so future readers don't have to make the
same leap.

## Limitations (v1)

- Analysis is scoped to a single method body. The rule does not track tasks passed across method boundaries.
- Field and property accesses are not tracked.
- Lambdas / local functions are analyzed as independent scopes.
- Re-assignment (`tasks = freshSource;`) resets tracking from that point onward.

## See also

- [CA1851: Possible multiple enumerations of `IEnumerable` collection](https://learn.microsoft.com/en-us/dotnet/fundamentals/code-analysis/quality-rules/ca1851) — the general-purpose rule. EPC38 specifically targets the task-producing case where re-enumeration is almost always a bug, not just a performance issue.
- [EPC39](EPC39.md) — quadratic enumeration inside a loop.
98 changes: 98 additions & 0 deletions docs/Rules/EPC39.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# EPC39 - Same enumerable iterated more than once inside a loop

Detects three patterns where the *same* `IEnumerable` is iterated more than once inside an outer loop,
producing O(N²) (or worse) behavior.

## Description

Iterating the same source twice inside a loop turns an O(N) walk into an O(N·M) one. When the inner pass
is a deferred LINQ chain, every outer iteration also re-runs every chained filter / projection.

The analyzer recognizes three sub-patterns and reports each with a tailored message:

### Q1 — Same-symbol nested enumeration

```csharp
foreach (var x in coll)
{
var n = coll.Count(); // ❌ EPC39 — Count() re-iterates the entire collection on every step
}
```

### Q2 — Deferred IEnumerable<T> re-enumerated in a loop

```csharp
IEnumerable<int> q = source.Where(x => x > 0);
foreach (var i in outer)
{
var c = q.Count(); // ❌ EPC39 — Where(...) re-runs every iteration
}
```

### Q4 — Nested foreach over the same source

```csharp
foreach (var x in coll)
{
foreach (var y in coll) { ... } // ❌ EPC39 — O(N²)
}
```

## How to fix

### Q1 / Q2 — materialize once, or use a different shape

```csharp
var arr = coll.ToArray(); // or .ToList()
foreach (var x in arr)
{
var n = arr.Length; // O(1)
}
```

If the inner work is genuinely "count every element matching X", pre-compute it outside the loop or replace
the loop body with a single LINQ aggregation that performs one pass.

### Q4 — symmetric problems usually benefit from a hash join

```csharp
var lookup = new HashSet<int>(coll);
foreach (var x in coll)
{
if (lookup.Contains(-x)) { ... }
}
```

## When NOT to suppress

If the data set is *known* to be small (say, ≤ 16 elements) and the loop is on a hot path with a tight
cache footprint, a linear scan can actually be faster than a HashSet lookup. In those cases prefer to
suppress at the call site with a comment explaining the choice:

```csharp
#pragma warning disable EPC39 // List is bounded to <=8 elements; HashSet has higher constant factor.
```

## Patterns intentionally NOT flagged

- Calling `List<T>.Contains`, `Array.IndexOf`, `List<T>.FindIndex`, etc. inside a loop on a collection
that is **not** the loop's source. The cost is O(N·M), but in real codebases these patterns are
frequently correct on small inputs and previously produced too much noise. Use
[CA1851](https://learn.microsoft.com/en-us/dotnet/fundamentals/code-analysis/quality-rules/ca1851)
if you want broader "any double-enumeration" coverage, or [EPC23](EPC23.md) for the specific
`HashSet.Contains`-via-LINQ misuse.
- `ImmutableArray<T>.Contains` and `Span<T>.IndexOf` — they don't allocate and are typically used on
tiny inputs.

## Limitations (v2)

- Analysis is scoped to a single method body. Helper calls that internally enumerate are not tracked.
- Q1/Q2 LINQ root resolution stops at the first non-chain call. A chain like
`GetItems().Where(...).Count()` is not flagged unless the root is a local/parameter (because resolving
the source through arbitrary method bodies is unsound).

## See also

- [CA1851: Possible multiple enumerations of `IEnumerable` collection](https://learn.microsoft.com/en-us/dotnet/fundamentals/code-analysis/quality-rules/ca1851) — the general-purpose rule. EPC39 is more aggressive in the *quadratic* dimension (it fires inside loops) but narrower than CA1851 in scope (it only fires when the source is the same enumerable).
- [EPC38](EPC38.md) — re-enumeration of `IEnumerable<Task>` (correctness bug).
- [EPC23](EPC23.md) — `Enumerable.Contains` on a `HashSet<T>` (a related "wrong tool for the job" pattern).
147 changes: 147 additions & 0 deletions docs/Rules/EPC40.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# EPC40 - Multiple enumeration of deferred query passed to private method

Detects a cross-method variant of the "multiple enumeration" bug that CA1851 cannot diagnose:
a private method enumerates its `IEnumerable<T>` parameter more than once **and** a caller passes a
deferred LINQ query (or iterator method, or `Enumerable.Range/Repeat/Empty`). Every enumeration
re-runs the entire query.

The rule is restricted to `private` methods because callers form a *closed world* within the
compilation — we can enumerate them all and decide statically whether any of them passes a deferred
source. For public/internal methods this would be undecidable.

## When the rule fires

A diagnostic is produced at the **call site** when **all** of these conditions hold:

1. The target method is `private`.
2. The target method takes one or more `IEnumerable<T>` (or non-generic `IEnumerable`) parameters.
3. The target method body enumerates that parameter **≥ 2 times** (counting `foreach`, LINQ
enumerating methods, and `Task.WhenAll(items)`/`WaitAll`/etc.).
4. The total number of call sites for the method is **≤ 3** (configurable cap; above the cap the
rule conservatively gives up).
5. The argument expression at *this* call site is a **deferred query**, determined by walking up to
**3 hops** through local declarations in the caller's body. A "deferred query" is one of:
- A LINQ chain method call (`Where`, `Select`, `OrderBy`, `SelectMany`, `Concat`, `Take`, ...).
- `Enumerable.Range`, `Enumerable.Repeat`, or `Enumerable.Empty`.
- An invocation of an **iterator method** (a method with `yield return` / `yield break` in its
body).

## Examples

### ❌ Bad

```csharp
private int Helper(IEnumerable<int> items) {
var c = items.Count();
return c + items.Sum(); // <-- second enumeration
}

void Caller(IEnumerable<int> src) {
Helper(src.Where(x => x > 0)); // ❌ EPC40 — deferred query, Helper iterates twice
}
```

### ❌ Bad — iterator producer

```csharp
static IEnumerable<int> Produce() {
yield return 1;
yield return 2;
}

void Caller() {
Helper(Produce()); // ❌ EPC40 — iterator method
}
```

### ❌ Bad — 2-hop provenance

```csharp
void Caller(IEnumerable<int> src) {
var step1 = src.Where(x => x > 0);
var step2 = step1;
Helper(step2); // ❌ EPC40 — walked 2 hops back to .Where(...)
}
```

### ✅ Good — materialized argument

```csharp
void Caller(IEnumerable<int> src) {
Helper(src.Where(x => x > 0).ToList()); // breaks the deferred chain
}
```

### ✅ Good — change the parameter type

```csharp
private int Helper(IReadOnlyList<int> items) { // signals intent + lets you index/Count in O(1)
var c = items.Count;
var sum = 0;
foreach (var x in items) sum += x;
return c + sum;
}
```

### ✅ Not flagged — list/array caller

```csharp
void Caller(List<int> src) {
Helper(src); // List<T> is materialized; re-enumeration is cheap
}
```

### ✅ Not flagged — caller forwards its own parameter

```csharp
void Caller(IEnumerable<int> src) {
Helper(src); // We can't tell whether 'src' itself is a query without inter-procedural analysis
}
```

This is intentional — chasing provenance further would require inter-procedural dataflow, which
explodes quickly. If the *immediate* caller is itself private, see the **transitivity** note below.

## How to fix

Pick the right tool for the situation:

| Situation | Fix |
| --- | --- |
| Helper genuinely needs Count + foreach | Change the parameter type to `IReadOnlyList<T>` / `IReadOnlyCollection<T>` / `T[]` |
| Helper genuinely needs to enumerate twice | Call `items = items.ToList()` at the top of the helper, or materialize at the caller |
| Helper can be rewritten as one pass | Restructure to a single enumeration (often the best fix) |

Materializing at the **caller** preserves the helper's signature and makes the cost explicit to readers:

```csharp
Helper(src.Where(x => x > 0).ToList());
```

Materializing at the **callee** localizes the fix:

```csharp
private int Helper(IEnumerable<int> items) {
var list = items.ToList();
return list.Count + list.Sum();
}
```

## Limitations (v1)

- **Scope is `private` only.** `internal` methods are excluded for now because consumers under
`InternalsVisibleTo` can break the closed-world assumption. May be expanded in v1.1 after dogfooding.
- **Caller cap = 3.** If a private method has more than 3 call sites, the rule does not fire. Such
methods are effectively public within the assembly and the signal-to-noise drops.
- **Provenance hops = 3.** Chains longer than 3 local declarations are not followed.
- **No transitivity.** If `A` calls private `B` calls private `C`, and `C` enumerates twice and the
call from `B` to `C` forwards a deferred parameter of `B`, this rule does *not* propagate the
warning back to `A`. Transitive analysis is plausible but out of scope for v1.
- **No type-flow.** A `Func<IEnumerable<int>>` returning a query, invoked at the call site, is not
resolved through the delegate.

## See also

- [CA1851: Possible multiple enumerations of `IEnumerable` collection](https://learn.microsoft.com/en-us/dotnet/fundamentals/code-analysis/quality-rules/ca1851) — same hazard, single-method scope only.
- [EPC38](EPC38.md) — `IEnumerable<Task>` re-enumeration (correctness bug variant).
- [EPC39](EPC39.md) — quadratic enumeration inside a loop (performance bug variant).
Loading
Loading