feat: parallel multi-grader fan-out with consensus strategies

## Problem

Currently each target has a single `grader_target` for LLM-as-judge scoring. A single grader can have blind spots or biases that affect score reliability.

## Proposal

Add `grader_targets` (plural) field for parallel grader fan-out:

```yaml
- name: copilot-cli
  provider: copilot-cli
  grader_targets:
    - grader-openrouter
    - grader-gemini
  grader_strategy: consensus  # or: majority, any, all
```

Each grader scores independently in parallel, then the strategy aggregates:

| Strategy | Behavior |
|----------|----------|
| `consensus` | All graders must agree (strictest) |
| `majority` | >50% of graders pass |
| `any` | At least one grader passes (most lenient) |
| `all` | Return all scores without aggregation (for analysis/comparison) |

## Result JSONL

When multiple graders are used, the result should include per-grader scores:

```json
{
  "scores": [
    { "type": "llm-grader", "grader": "grader-openrouter", "score": 0.9 },
    { "type": "llm-grader", "grader": "grader-gemini", "score": 0.8 }
  ],
  "grader_strategy": "majority",
  "grader_agreement": 1.0
}
```

The `grader_agreement` field (0.0–1.0) measures inter-grader reliability.

## Use cases

- **Reduce grader bias**: one LLM's blind spots covered by another
- **Cross-provider validation**: ensure scores aren't provider-dependent
- **Confidence scoring**: high agreement = high confidence in score
- **A/B testing graders**: compare grader quality before switching

## Prior art

- Google ADK: multiple evaluator judges with voting
- LMSYS Chatbot Arena: multi-judge ranking
- No eval framework exposes this declaratively in YAML config yet

## Backward compatibility

- `grader_target` (singular) continues to work unchanged
- `grader_targets` is optional; default strategy is `majority`
- When only one grader is specified in `grader_targets`, behaves identically to `grader_target`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: parallel multi-grader fan-out with consensus strategies #906

Problem

Proposal

Result JSONL

Use cases

Prior art

Backward compatibility

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Strategy	Behavior
`consensus`	All graders must agree (strictest)
`majority`	>50% of graders pass
`any`	At least one grader passes (most lenient)
`all`	Return all scores without aggregation (for analysis/comparison)

feat: parallel multi-grader fan-out with consensus strategies #906

Description

Problem

Proposal

Result JSONL

Use cases

Prior art

Backward compatibility

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions