Merge pull request #224 from ArcInstitute/feat/claude-support

noamteyssier · web-flow · commit e9c11a9e1387 · 2026-02-26T11:37:39.000-08:00
Feat/claude support
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,84 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+**cell-eval** is a Python package and CLI tool for evaluating the performance of models that predict cellular responses to perturbations at the single-cell level. Developed by the Arc Research Institute.
+
+It generally revolves around a *real* anndata and a *predicted* anndata where it measures the general differences between the two across a variety of metrics.
+
+- Python 3.11–3.12, managed with **UV** and built with **hatchling**
+- CLI entry point: `cell-eval` (defined in `src/cell_eval/__main__.py`)
+
+## Common Commands
+
+```bash
+# Install dependencies
+uv sync --all-extras --dev
+
+# Run all tests
+uv run pytest -v
+
+# Run a single test
+uv run pytest tests/test_eval.py::test_broken_adata_not_normlog -v
+
+# Formatting (check / fix)
+uv run ruff format --check
+uv run ruff format
+
+# Type checking
+uv run ty check
+
+# Verify CLI works
+uv run cell-eval --help
+```
+
+CI runs: formatting, typing, pytest, and cli-test (see `.github/workflows/CI.yml`).
+
+## Architecture
+
+### Core Data Flow
+
+```
+AnnData inputs (predicted + real)
+  → MetricsEvaluator (validation, normalization, DE computation)
+    → MetricPipeline (profile-based metric selection + execution)
+      → metrics_registry (global MetricRegistry instance)
+        → individual metric functions
+  → polars DataFrames (per-perturbation + aggregated results)
+```
+
+### Key Abstractions
+
+- **`MetricsEvaluator`** (`src/cell_eval/_evaluator.py`) — Main programmatic entry point. Validates input AnnData objects, computes differential expression via `pdex`, and orchestrates the metric pipeline.
+
+- **`MetricRegistry`** (`src/cell_eval/metrics/_registry.py`) — Global singleton `metrics_registry`. Metrics are registered with a name, type (`DE` or `ANNDATA_PAIR`), compute function, and best-value indicator. Supports both plain functions and class-based metrics requiring instantiation.
+
+- **`MetricPipeline`** (`src/cell_eval/_pipeline/_runner.py`) — Selects and runs metrics based on a profile (`full`, `minimal`, `vcc`, `de`, `anndata`, `pds`). Collects per-perturbation results and aggregates them.
+
+- **`Metric` protocol** (`src/cell_eval/metrics/base.py`) — All metric functions take either a `PerturbationAnndataPair` or `DEComparison` and return `float | dict[str, float]`.
+
+- **Type system** (`src/cell_eval/_types/`) — Immutable dataclasses: `PerturbationAnndataPair`, `DEComparison`, plus enums `MetricType`, `MetricBestValue`, `DESortBy`.
+
+### Metrics
+
+Metrics are split into two categories registered in `src/cell_eval/metrics/_impl.py`:
+
+- **AnnData metrics** (`_anndata.py`): pearson_delta, mse, mae, mse_delta, mae_delta, discrimination_score, clustering_agreement, edistance
+- **DE metrics** (`_de.py`): overlap/precision at N, spearman correlations, direction match, significant gene recall, ROC/PR AUC
+
+### CLI
+
+Subcommands in `src/cell_eval/_cli/`: `prep` (data preparation for VCC), `run` (evaluation), `baseline` (create baseline), `score` (normalize against baseline). CLI defaults are in `_cli/_const.py`.
+
+### Test Data Utilities
+
+`cell_eval.data` provides `build_random_anndata()` and `downsample_cells()` for generating synthetic AnnData objects in tests.
+
+## Conventions
+
+- Uses `polars` (not pandas) for DataFrames
+- Uses `match`/`case` statements (Python 3.10+ syntax)
+- Type hints throughout; PEP 561 `py.typed` marker present
+- Private modules prefixed with `_` (public API is re-exported from `__init__.py`)
diff --git a/pyproject.toml b/pyproject.toml
@@ -33,7 +33,3 @@ dev = [
 
 [project.scripts]
 cell-eval = "cell_eval.__main__:main"
-
-[tool.pyright]
-venvPath = "."
-venv = ".venv"
diff --git a/ruff.toml b/ruff.toml