cell2location is a Bayesian model for spatial transcriptomics that estimates cell type abundances in spatial data by integrating single-cell RNA-seq reference signatures. It is built on top of scvi-tools and Pyro.
Repository: https://github.com/BayraktarLab/cell2location
- Language: Python 3.10+
- Core dependencies: pyro-ppl, scvi-tools, torch, numpy, pandas, scanpy
- ML framework: PyTorch + Pyro (probabilistic programming)
- Data format: AnnData (h5ad files)
- Black for code formatting (line length: 120)
- isort for import sorting (profile: black, trailing comma)
- flake8 for linting (ignored: E203, E266, E501, W503, W605, N812; max line length: 119)
- Pre-commit hooks enforce all of the above
Run formatting before committing:
black --line-length 120 .
isort .
flake8cell2location/
├── models/ # Main model classes (Cell2location, RegressionModel)
│ ├── base/ # Base modules and mixins (Pyro integration)
│ ├── reference/ # Reference signature estimation model
│ └── simplified/ # Simplified model variants
├── nn/ # Neural network layers (FC layers, context layers)
├── dataloaders/ # Custom data loaders for spatial data
├── distributions/ # Custom probability distributions
├── cell_comm/ # Cell communication analysis
├── cluster_averages/ # Cluster average computation
├── plt/ # Plotting utilities
└── utils/ # General utilities
Tests are in tests/. Run with:
pytestCI runs on Python 3.10 (Ubuntu). Tests include model training smoke tests.
- Models follow the scvi-tools pattern: a user-facing
Modelclass wraps a PyTorchModule - Pyro is used for variational inference (SVI with ELBO)
QuantileMixinandPltExportMixinadd posterior quantile computation and plotting- Data registration uses scvi-tools
AnnDataManagerwith field types (LayerField, CategoricalObsField, etc.)
- Always ensure
adata.var_namesmatchescell_state_df.indexwhen using Cell2location - The
N_cells_per_locationparameter significantly affects model behavior - Detection mean correction handles sensitivity differences between spatial and single-cell data