This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Important: This file must be kept up to date with the codebase. Any time the public API, output schema, modes, parameters, or architecture changes, update the relevant sections here before closing the task.
pdex is a Python library for Parallel Differential Expression (PDEX) analysis in single-cell genomics, focused on conditional screens.
It computes per-gene statistics comparing perturbation groups against a reference using Mann-Whitney U tests with FDR correction.
It also provides functionality for per-gene statistics on 1-vs-rest comparisons and on-target single-gene comparisons.
# Install / sync dependencies
uv sync
# Run all tests
uv run pytest -v
# Run a specific test file
uv run pytest tests/test_pdex.py
# Run a single test by name
uv run pytest tests/test_pdex.py::TestPdexRefMode::test_columns
# Lint and format
uv run ruff format
# Type check
uv run ty checkThe main entry point is pdex(adata, groupby, mode, threads, is_log1p, geometric_mean, as_pandas, epsilon, **kwargs), which:
- Validates the
groupbycolumn inadata.obs - Extracts unique groups (filters NaN and empty strings)
- Identifies a reference group (defaults to
"non-targeting"in"ref"and"on_target"modes) - For each non-reference group, slices the expression matrix, computes pseudobulk (mean), fold change, percent change, and Mann-Whitney U statistic vs the reference
- Applies per-group FDR correction (scipy) and returns a Polars DataFrame (or pandas if
as_pandas=True)
Three modes:
"ref": each non-reference group vs a single reference group (reference group is excluded from output)"all": each group vs all remaining cells (1-vs-rest)"on_target": each non-reference group vs the reference, but only at the single gene targeted by that group (requiresgene_col=kwarg)
Unexpected **kwargs for any mode trigger a UserWarning.
| File | Role |
|---|---|
src/pdex/__init__.py |
pdex() entry point and full pipeline logic |
src/pdex/_math.py |
Numba JIT-compiled fold_change(), percent_change(), and mwu() wrappers; pseudobulk() dispatcher |
src/pdex/_utils.py |
set_numba_threadpool() — sets Numba thread count before JIT warmup; _detect_is_log1p() heuristic |
- Numba JIT compilation accelerates per-cell/per-gene math (
fold_change,percent_change,_log1p_col_mean,_expm1_vec) numba-mwu(external dep) provides a Numba-accelerated Mann-Whitney U implementation- Sparse CSR matrices are handled by reusing pre-computed non-targeting column indices to avoid redundant dense conversion
- Parallelism is controlled via
threadspassed toset_numba_threadpool()
The returned Polars DataFrame (or pandas DataFrame when as_pandas=True) has columns:
| Column | Type | Description |
|---|---|---|
target |
str | Perturbation group name |
feature |
str | Gene name |
target_mean |
float | Pseudobulk mean for the target group, always in natural (count) space |
ref_mean |
float | Pseudobulk mean for the reference, always in natural (count) space |
target_membership |
int | Number of cells in the target group |
ref_membership |
int | Number of cells in the reference |
fold_change |
float | log2((target_mean + epsilon) / (ref_mean + epsilon)) — computed from pseudobulk means |
percent_change |
float | (target_mean - ref_mean) / (ref_mean + epsilon) — computed from pseudobulk means |
p_value |
float | Mann-Whitney U p-value (per-cell vectors) |
statistic |
float | Mann-Whitney U statistic |
fdr |
float | FDR-corrected p-value, applied per-group across genes. For on_target mode, applied across all groups. |
target_mean and ref_mean are always in natural (count) space regardless of is_log1p or geometric_mean.
FDR is corrected within each group (across genes) for ref and all modes. For on_target mode, it is applied across all resulting p-values.
from pdex import pdex, DEFAULT_REFERENCEManaged with uv. Build backend: hatchling. Key packages: anndata, numba, numba-mwu, polars, pyarrow, scipy, tqdm. Dev tools: pytest, ruff, ty.