ArcInstitute
diff --git a/‎.github/workflows/ci.yml‎
Lines changed: 8 additions & 5 deletions b/‎.github/workflows/ci.yml‎
Lines changed: 8 additions & 5 deletions
diff --git a/‎.python-version‎
Lines changed: 1 addition & 1 deletion b/‎.python-version‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎CLAUDE.md‎
Lines changed: 99 additions & 0 deletions b/‎CLAUDE.md‎
Lines changed: 99 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 77 additions & 103 deletions b/‎README.md‎
Lines changed: 77 additions & 103 deletions
@@ -12,6 +12,9 @@ jobs:
 
   install-job:
     runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ["3.13", "3.12", "3.11"]
 
     steps:
       - uses: actions/checkout@v4
@@ -21,7 +24,7 @@ jobs:
         with:
           enable-cache: true
           cache-dependency-glob: "pyproject.toml"
-          python-version: "3.12"
+          python-version: ${{ matrix.python-version }}
 
       - name: install dependencies
         run: |
@@ -40,7 +43,7 @@ jobs:
         with:
           enable-cache: true
           cache-dependency-glob: "pyproject.toml"
-          python-version: "3.12"
+          python-version: "3.13"
 
       - name: install dependencies
         run: |
@@ -63,15 +66,15 @@ jobs:
         with:
           enable-cache: true
           cache-dependency-glob: "pyproject.toml"
-          python-version: "3.12"
+          python-version: "3.13"
 
       - name: install dependencies
         run: |
           uv sync --all-extras --dev
 
       - name: run type checking
         run: |
-          uv run pyright
+          uv run ty check
 
   pytest:
     runs-on: ubuntu-latest
@@ -86,7 +89,7 @@ jobs:
         with:
           enable-cache: true
           cache-dependency-glob: "pyproject.toml"
-          python-version: "3.12"
+          python-version: "3.13"
 
       - name: install dependencies
         run: |
 
@@ -1 +1 @@
-3.10
+3.13
@@ -0,0 +1,99 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+**Important:** This file must be kept up to date with the codebase. Any time the public API, output schema, modes, parameters, or architecture changes, update the relevant sections here before closing the task.
+
+## Project Overview
+
+`pdex` is a Python library for Parallel Differential Expression (PDEX) analysis in single-cell genomics, focused on conditional screens.
+It computes per-gene statistics comparing perturbation groups against a reference using Mann-Whitney U tests with FDR correction.
+It also provides functionality for per-gene statistics on 1-vs-rest comparisons and on-target single-gene comparisons.
+
+## Commands
+
+```bash
+# Install / sync dependencies
+uv sync
+
+# Run all tests
+uv run pytest -v
+
+# Run a specific test file
+uv run pytest tests/test_pdex.py
+
+# Run a single test by name
+uv run pytest tests/test_pdex.py::TestPdexRefMode::test_columns
+
+# Lint and format
+uv run ruff format
+
+# Type check
+uv run ty check
+```
+
+## Architecture
+
+### Core Pipeline (`src/pdex/__init__.py`)
+
+The main entry point is `pdex(adata, groupby, mode, threads, is_log1p, geometric_mean, as_pandas, **kwargs)`, which:
+
+1. Validates the `groupby` column in `adata.obs`
+2. Extracts unique groups (filters NaN and empty strings)
+3. Identifies a reference group (defaults to `"non-targeting"` in `"ref"` and `"on_target"` modes)
+4. For each non-reference group, slices the expression matrix, computes pseudobulk (mean), fold change, percent change, and Mann-Whitney U statistic vs the reference
+5. Applies per-group FDR correction (scipy) and returns a Polars DataFrame (or pandas if `as_pandas=True`)
+
+Three modes:
+
+- `"ref"`: each non-reference group vs a single reference group (reference group is excluded from output)
+- `"all"`: each group vs all remaining cells (1-vs-rest)
+- `"on_target"`: each non-reference group vs the reference, but only at the single gene targeted by that group (requires `gene_col=` kwarg)
+
+Unexpected `**kwargs` for any mode trigger a `UserWarning`.
+
+### Key Files
+
+| File                   | Role                                                                                                    |
+| ---------------------- | ------------------------------------------------------------------------------------------------------- |
+| `src/pdex/__init__.py` | `pdex()` entry point and full pipeline logic                                                            |
+| `src/pdex/_math.py`    | Numba JIT-compiled `fold_change()`, `percent_change()`, and `mwu()` wrappers; `pseudobulk()` dispatcher |
+| `src/pdex/_utils.py`   | `set_numba_threadpool()` — sets Numba thread count before JIT warmup; `_detect_is_log1p()` heuristic    |
+
+### Performance Design
+
+- Numba JIT compilation accelerates per-cell/per-gene math (`fold_change`, `percent_change`, `_log1p_col_mean`, `_expm1_vec`)
+- `numba-mwu` (external dep) provides a Numba-accelerated Mann-Whitney U implementation
+- Sparse CSR matrices are handled by reusing pre-computed non-targeting column indices to avoid redundant dense conversion
+- Parallelism is controlled via `threads` passed to `set_numba_threadpool()`
+
+### Output Schema
+
+The returned Polars DataFrame (or pandas DataFrame when `as_pandas=True`) has columns:
+
+| Column              | Type  | Description                                                           |
+| ------------------- | ----- | --------------------------------------------------------------------- |
+| `target`            | str   | Perturbation group name                                               |
+| `feature`           | str   | Gene name                                                             |
+| `target_mean`       | float | Pseudobulk mean for the target group, always in natural (count) space |
+| `ref_mean`          | float | Pseudobulk mean for the reference, always in natural (count) space    |
+| `target_membership` | int   | Number of cells in the target group                                   |
+| `ref_membership`    | int   | Number of cells in the reference                                      |
+| `fold_change`       | float | log2(target_mean / ref_mean) — computed from pseudobulk means         |
+| `percent_change`    | float | (target_mean - ref_mean) / ref_mean — computed from pseudobulk means  |
+| `p_value`           | float | Mann-Whitney U p-value (per-cell vectors)                             |
+| `statistic`         | float | Mann-Whitney U statistic                                              |
+| `fdr`               | float | FDR-corrected p-value, applied per-group across genes. For `on_target` mode, applied across all groups.                 |
+
+`target_mean` and `ref_mean` are always in natural (count) space regardless of `is_log1p` or `geometric_mean`.
+FDR is corrected within each group (across genes) for `ref` and `all` modes. For `on_target` mode, it is applied across all resulting p-values.
+
+### Public API (`__all__`)
+
+```python
+from pdex import pdex, DEFAULT_REFERENCE
+```
+
+## Dependencies
+
+Managed with `uv`. Build backend: `hatchling`. Key packages: `anndata`, `numba`, `numba-mwu`, `polars`, `pyarrow`, `scipy`, `tqdm`. Dev tools: `pytest`, `ruff`, `ty`.
@@ -1,128 +1,102 @@
 # pdex
 
-parallel differential expression for single-cell perturbation sequencing
+Parallel differential expression for single-cell perturbation sequencing.
 
 ## Installation
 
-Add to your `pyproject.toml` file with [`uv`](https://github.com/astral-sh/uv)
-
 ```bash
+# add to pyproject.toml
 uv add pdex
-```
 
-## Summary
+# add to env
+uv pip install pdex
+```
 
-This is a python package for performing parallel differential expression between multiple groups and a control.
+## Overview
 
-It is optimized for very large datasets and very large numbers of perturbations.
+`pdex` computes per-gene differential expression statistics between perturbation groups in single-cell data using Mann-Whitney U tests with FDR correction. It was originally designed for CRISPR screen and perturbation sequencing datasets with many groups and large cell counts.
 
-It makes use of shared memory to parallelize the computation to a high number of threads and minimizes the [IPC](https://en.wikipedia.org/wiki/Inter-process_communication) between processes to reduce overhead.
+It supports dense and sparse (CSR) expression matrices, and uses [numba-mwu](https://github.com/noamteyssier/numba-mwu) for Numba-accelerated Mann-Whitney U computation.
 
-It supports the following metrics:
+## Modes
 
-- Wilcoxon Rank Sum
-- Anderson-Darling
-- T-Test
+| Mode          | Description                                                         |
+| ------------- | ------------------------------------------------------------------- |
+| `"ref"`       | Each group vs a single reference group (default: `"non-targeting"`) |
+| `"all"`       | Each group vs all remaining cells (1-vs-rest)                       |
+| `"on_target"` | Each group vs the reference at its single target gene only          |
 
-## Backed vs In-Memory AnnData
+## Usage
 
-pdex adapts its execution strategy based on how the AnnData object is stored:
+### Reference mode (default)
 
-- **In-memory AnnData** (e.g., loaded via `sc.read_h5ad(path)` without `backed="r"`):
-  pdex uses a shared-memory multiprocessing workflow. Each worker process gets
-  access to the full expression matrix through shared memory, which minimizes
-  serialization overhead. Parallelism is configured via the `num_workers`
-  parameter (process count). `num_threads` is ignored in this mode because numba
-  kernels operate on per-target slices entirely in memory.
-- **Backed AnnData** (`adata.X` is an on-disk HDF5 dataset): pdex automatically
-  switches to the low-memory chunked implementation. Gene chunks are streamed
-  from disk, the reference group is computed once per chunk, and targets are
-  processed in parallel via `num_workers` (thread pool). Within each target,
-  Wilcoxon metrics can additionally use numba parallelization controlled by
-  `num_threads`. This mode avoids loading the entire matrix into RAM while still
-  enabling both target-level and gene-level parallelism.
+```python
+import anndata as ad
+from pdex import pdex
 
-If a backed dataset is supplied without enabling low-memory mode, pdex raises a
-helpful error explaining that chunked processing is required. Conversely, you can
-force the chunked path for large in-memory matrices by passing `low_memory=True`.
+adata = ad.read_h5ad("screen.h5ad")
 
-## Parallelization
+results = pdex(
+    adata,
+    groupby="guide",
+    mode="ref",
+    is_log1p=False,
+)
+```
 
-`parallel_differential_expression` exposes two orthogonal knobs for controlling
-parallel execution:
+### 1-vs-rest mode
 
-- `num_workers` controls the number of Python threads that process targets within
-  each gene chunk. `None` (default in low-memory mode) enables an auto-detected
-  worker count based on available CPUs, while `1` disables thread-level parallelism.
-- `num_threads` controls the numba thread pool used by the Wilcoxon kernel. `None`
-  lets numba auto-detect the optimal size, whereas `1` turns numba parallelization
-  off. This setting is only used in low-memory mode and only when `metric="wilcoxon"`.
-  When pdex detects non-integer expression values in a gene chunk (for example, after
-  log-normalization), it automatically disables numba for that chunk, logs a warning,
-  and falls back to the SciPy implementation to preserve correct rank ordering.
+```python
+results = pdex(
+    adata,
+    groupby="guide",
+    mode="all",
+    is_log1p=False,
+)
+```
 
-These strategies can be combined: for example, `num_workers=2, num_threads=8`
-runs two target threads that share an eight-thread numba pool. When the metric
-does not support numba acceleration, pdex automatically logs a warning and
-falls back to thread-only execution.
+### On-target mode
 
-## Usage
+Requires a column in `adata.obs` mapping each group to its target gene:
 
 ```python
-import anndata as ad
-import numpy as np
-import pandas as pd
-
-from pdex import parallel_differential_expression
-
-PERT_COL = "perturbation"
-CONTROL_VAR = "control"
-
-N_CELLS = 1000
-N_GENES = 100
-N_PERTS = 10
-MAX_UMI = 1e6
-
-
-def build_random_anndata(
-    n_cells: int = N_CELLS,
-    n_genes: int = N_GENES,
-    n_perts: int = N_PERTS,
-    pert_col: str = PERT_COL,
-    control_var: str = CONTROL_VAR,
-) -> ad.AnnData:
-    """Sample a random AnnData object."""
-    return ad.AnnData(
-        X=np.random.randint(0, MAX_UMI, size=(n_cells, n_genes)),
-        obs=pd.DataFrame(
-            {
-                pert_col: np.random.choice(
-                    [f"pert_{i}" for i in range(n_perts)] + [control_var],
-                    size=n_cells,
-                    replace=True,
-                ),
-            }
-        ),
-    )
-
-
-def main():
-    adata = build_random_anndata()
-
-    # Run pdex with default metric (wilcoxon)
-    results = parallel_differential_expression(
-        adata,
-        reference=CONTROL_VAR,
-        groupby_key=PERT_COL,
-    )
-    assert results.shape[0] == N_GENES * N_PERTS
-
-    # Run pdex with alt metric (anderson)
-    results = parallel_differential_expression(
-        adata,
-        reference=CONTROL_VAR,
-        groupby_key=PERT_COL,
-        metric="anderson"
-    )
-    assert results.shape[0] == N_GENES * N_PERTS
+results = pdex(
+    adata,
+    groupby="guide",
+    mode="on_target",
+    gene_col="target_gene",
+    is_log1p=False,
+)
 ```
+
+## Parameters
+
+| Parameter        | Type           | Default           | Description                                                |
+| ---------------- | -------------- | ----------------- | ---------------------------------------------------------- |
+| `adata`          | `AnnData`      | required          | Annotated data matrix (dense or sparse CSR)                |
+| `groupby`        | `str`          | required          | Column in `adata.obs` defining groups                      |
+| `mode`           | `str`          | `"ref"`           | Comparison mode: `"ref"`, `"all"`, or `"on_target"`        |
+| `threads`        | `int`          | `0`               | Numba thread count (`0` = all CPUs)                        |
+| `is_log1p`       | `bool \| None` | `None`            | Whether data is log1p-transformed. Auto-detected if `None` |
+| `geometric_mean` | `bool`         | `True`            | Use geometric mean for pseudobulk (vs arithmetic)          |
+| `as_pandas`      | `bool`         | `False`           | Return a pandas DataFrame instead of Polars                |
+| `reference`      | `str`          | `"non-targeting"` | Reference group name (modes: `ref`, `on_target`)           |
+| `gene_col`       | `str`          | —                 | Column mapping groups to target genes (mode: `on_target`)  |
+
+## Output
+
+Returns a Polars DataFrame (or pandas if `as_pandas=True`) with one row per (group, gene) pair:
+
+| Column              | Description                                        |
+| ------------------- | -------------------------------------------------- |
+| `target`            | Perturbation group name                            |
+| `feature`           | Gene name                                          |
+| `target_mean`       | Pseudobulk mean for the target group (count space) |
+| `ref_mean`          | Pseudobulk mean for the reference (count space)    |
+| `target_membership` | Number of cells in the target group                |
+| `ref_membership`    | Number of cells in the reference                   |
+| `fold_change`       | log2(target_mean / ref_mean)                       |
+| `percent_change`    | (target_mean - ref_mean) / ref_mean                |
+| `p_value`           | Mann-Whitney U p-value                             |
+| `statistic`         | Mann-Whitney U statistic                           |
+| `fdr`               | FDR-corrected p-value (per-group, across genes). For `on_target` mode, this is applied across all groups.    |