Rewrite README: metrics pipeline + autoresearch

Andrey Golovanov · claude · Andrey Golovanov · commit d4c4a7fc561d · 2026-03-26T11:18:12.000Z
Previous README had wrong license (AGPL, now MIT), wrong API examples
(compute_bac_metrics doesn't exist), and no mention of autoresearch.

New README covers both capabilities with working examples:
- Metrics pipeline: CLI and Python API with correct function names
- Autoresearch: three-stage pipeline, quick start, multi-cycle, persistence

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/README.md b/README.md
@@ -1,33 +1,14 @@
 # NetLab
 
-Metrics aggregation and statistical analysis for [NetGraph](https://github.com/networmix/NetGraph) simulation results.
+Metrics and autonomous research tools for [NetGraph](https://github.com/networmix/NetGraph) network simulations.
 
-## Overview
+## What It Does
 
-NetLab processes NetGraph workflow outputs (JSON artifacts) to compute statistical metrics across random seeds and failure scenarios. Provides CLI and Python API for batch analysis, cross-seed aggregation, and visualization.
+NetLab has two capabilities:
 
-## Features
+1. **Metrics pipeline** — computes verified reliability metrics (BAC, latency, alpha) from ngraph simulation results. Per-direction, occurrence-count-weighted, hand-verified against 252 assertions.
 
-### Metrics
-
-- **BAC (Bandwidth Availability Curve)**: Delivered bandwidth quantiles, availability at thresholds, AUC normalization
-- **Latency**: Stretch distributions (p50/p99), tail degradation ratios, SLO compliance, WES (Weighted Excess Stretch)
-- **IterationOps**: Per-iteration SPF calls, flows created, reoptimization calls, placement iterations
-- **SPS (Structural Pair Survivability)**: Fraction of src-dst pairs meeting demand under failures
-- **MSD (Maximum Supported Demand)**: Alpha-star multiplier for traffic matrix scaling capacity
-- **CostPower**: CapEx/Power totals, USD/Watt per Gbit (offered and at p99.9 reliability)
-
-### Cross-Seed Analysis
-
-- Positional alignment of time-series data by iteration index
-- Median and IQR (interquartile range) computation
-- Variable-length series handling with NaN padding
-
-### Visualization
-
-- Cross-seed plots with median curves and IQR bands
-- Baseline-normalized delta comparisons
-- Statistical significance heatmaps (p-values)
+2. **Autoresearch** — LLM-driven topology exploration. Describe a connectivity idea in natural language, and the system generates a valid ngraph scenario, runs the simulation, computes metrics, and produces a structural interpretation. The LLM also proposes the next experiment, closing the research loop.
 
 ## Installation
 
@@ -43,68 +24,177 @@ cd NetLab
 make dev
 ```
 
-## Usage
+## Metrics
 
 ### CLI
 
 ```bash
-# Compute metrics for scenarios
-netlab metrics tests/data/scenarios/
+# Compute metrics for all scenarios in a directory
+netlab metrics path/to/scenarios/
 
-# With summary tables and plots
-netlab metrics tests/data/scenarios/ --summary
+# Summary tables only, no plots
+netlab metrics path/to/scenarios/ --no-plots
 
 # Filter specific scenarios
-netlab metrics tests/data/scenarios/ --only small_clos,small_dragonfly
-
-# Skip plot generation
-netlab metrics tests/data/scenarios/ --no-plots
+netlab metrics path/to/scenarios/ --only small_clos,small_dragonfly
 ```
 
 ### Python API
 
 ```python
-from metrics.bac import compute_bac_metrics
-from metrics.aggregate import summarize_across_seeds
+from metrics.bac import compute_bac
+from metrics.latency import compute_latency_stretch
+from metrics.msd import compute_alpha_star
+
+# Load ngraph results
+import json
+with open("scenario.results.json") as f:
+    results = json.load(f)
+
+# Capacity
+alpha = compute_alpha_star(results)
+print(f"alpha_star: {alpha.alpha_star}")
+
+# Bandwidth availability (aggregate + per direction)
+bac = compute_bac(results, step_name="tm_placement")
+print(f"BAC AUC: {bac.auc_normalized:.4f}")
+for label, pf in bac.per_flow.items():
+    print(f"  {label}: AUC={pf.auc_normalized:.4f}")
+
+# Latency stretch
+lat = compute_latency_stretch(results)
+print(f"baseline p99: {lat.baseline['p99']:.4f}")
+print(f"failure p99:  {lat.failures['p99']:.4f}")
+```
+
+### Metrics Reference
+
+| Metric | What it measures |
+|--------|-----------------|
+| **BAC** | Delivered bandwidth distribution across failure iterations. AUC, quantiles, availability at thresholds, BW at probability levels. Per-direction breakdown. |
+| **Latency** | Volume-weighted stretch (cost / baseline cost). p50, p95, p99 percentiles, SLO compliance, WES (weighted excess stretch). |
+| **Alpha (MSD)** | Maximum demand multiplier the topology supports before saturation. |
+| **SPS** | Fraction of source-destination demand satisfied under failures. |
+| **CostPower** | CapEx and power normalized by offered demand and reliable bandwidth. |
+| **IterOps** | Failure iteration counts, unique pattern counts, timing. |
+
+All metrics correctly handle ngraph's Monte Carlo deduplication (`occurrence_count` expansion).
+
+## Autoresearch
+
+LLM-driven topology research with verified metrics. Three stages:
+
+```
+Hypothesis (natural language)
+    ↓
+[Generation Loop] LLM → ngraph YAML → inspect → validate → iterate
+    ↓
+[Simulation] ngraph run (expensive, once)
+    ↓
+[Analysis] Metrics pipeline (verified) → LLM interprets → proposes next hypothesis
+```
+
+### Quick Start
+
+```python
+from pathlib import Path
+from netlab.autoresearch.hypothesis_manager import HypothesisManager
+from netlab.autoresearch.backend import ClaudeCLIBackend
+import sys
+
+manager = HypothesisManager(
+    project_dir=Path("/tmp/my_research"),
+    backend=ClaudeCLIBackend(model="sonnet"),
+    ngraph_bin=str(Path(sys.executable).parent / "ngraph"),
+)
+
+cycle = manager.run_cycle("""
+2-site topology, 3 backbone planes, 100 Gbps cross-site per plane.
+Internal 500 Gbps. BB nodes with role: bb.
+Demands: 100 Gbps each direction, ECMP.
+Failure: single random BB node, 20 iterations.
+""")
+
+print(cycle.analysis.metrics_report)      # verified numbers
+print(cycle.analysis.interpretation)       # LLM explanation
+print(cycle.analysis.next_hypothesis)      # what to test next
+```
 
-# Compute BAC for a workflow step
-bac = compute_bac_metrics(iterations, offered_bw, step_name="max_demand")
-print(f"p99 availability: {bac['q_pct'][0.99]:.2%}")
+### Multi-Cycle Research
 
-# Aggregate across seeds
-summary = summarize_across_seeds(series_by_seed, label="latency_p99")
+```python
+hypothesis = "your initial idea..."
+for i in range(5):
+    cycle = manager.run_cycle(hypothesis)
+    print(f"Cycle {cycle.cycle_id}: {cycle.status}")
+    hypothesis = cycle.analysis.next_hypothesis  # LLM proposes next
 ```
 
+### What Gets Persisted
+
+```
+project_dir/
+  cycle_log.jsonl           # one-line summary per cycle
+  cycles/001/
+    hypothesis.yml          # what was tested
+    scenario.yml            # generated ngraph YAML
+    results/                # ngraph simulation output
+    metrics_report.md       # verified numbers (machine-generated)
+    interpretation.md       # structural explanation (LLM-generated)
+    next_hypothesis.md      # suggested next experiment (LLM-generated)
+    status.yml              # analyzed | failed | skipped
+```
+
+### Key Design Decision
+
+The LLM never extracts numbers from results. The metrics pipeline (same code that passed 252 hand-calculated assertions) computes all numbers programmatically. The LLM receives verified metrics and provides only interpretation — connecting numbers to topology structure.
+
 ## Repository Structure
 
 ```
-metrics/            # Core metrics modules
-  bac.py            # Bandwidth availability
-  latency.py        # Latency percentiles
-  iterops.py        # Iteration analysis
-  aggregate.py      # Cross-seed aggregation
-  plot_*.py         # Visualization
-netlab/             # CLI
-  cli.py            # Command-line interface
-  metrics_cmd.py    # Metrics command implementation
-tests/              # Test suite
-lib/                # Config files
+metrics/                    # Verified metrics pipeline
+  common.py                 # Shared: expand_flow_results, canonical_dc
+  bac.py                    # Bandwidth availability curve
+  latency.py                # Latency stretch analysis
+  msd.py                    # Maximum supported demand
+  sps.py                    # Structural pair survivability
+  iterops.py                # Iteration counts and timing
+  aggregate.py              # Cross-seed aggregation
+  costpower.py              # Cost and power normalization
+  matrixdump.py             # Per-pair placement matrices
+  metrics_report.py         # → autoresearch (in netlab/)
+netlab/
+  cli.py                    # CLI entry point
+  metrics_cmd.py            # Metrics command orchestration
+  autoresearch/
+    generation_loop.py      # Inner Loop 1: idea → validated YAML
+    analysis_loop.py        # Inner Loop 2: metrics → interpretation
+    metrics_report.py       # Programmatic metrics → markdown
+    hypothesis_manager.py   # Outer loop: hypothesis cycles + persistence
+    backend.py              # LLM backends (Claude CLI, OpenAI, mock)
+    scenario_generator.py   # DC-BB topology generator
+    sweep.py                # Parametric sweep runner
+tests/
+  data/mini_dcbb.yaml       # 10-node verification scenario
+  test_mini_dcbb_verification.py  # 252 hand-calculated assertions
 ```
 
 ## Development
 
 ```bash
 make dev        # Setup environment
-make check      # Tests and linting
+make check      # Pre-commit + tests + lint
 make test       # Tests only
 make lint       # Linting only
+make qt         # Quick tests (skip slow)
 ```
 
 ## Requirements
 
 - Python 3.11+
-- Dependencies: numpy, pandas, matplotlib, seaborn, scipy, rich, ngraph
+- [ngraph](https://github.com/networmix/NetGraph) >= 0.21.0
+- [netgraph-core](https://github.com/networmix/NetGraph-Core) >= 0.7.0
 
 ## License
 
-AGPL-3.0-or-later
+[MIT License](LICENSE)