Skip to content

theislab/comparative_atlas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Supp_fig_modelOverview

Incremental comparative Atlas Construction with Bregman-Regularized Replay

This repository implements an incremental comparative atlas construction framework using:

  1. Bregman Information (BI) to build a replay buffer
  2. Fisher Information–based importance weighting
  3. Regularized incremental model updates

The workflow enables robust case–control integration without catastrophic forgetting.


Overview of the Method

The pipeline consists of four stages:

Input

  • Integrated reference atlas (multi-study healthy reference)
  • Case–control query data (multi-study)

(1) Compute Bregman Information → Create Replay Buffer

Select informative reference cells to preserve during continual training.

(2) Compute Importance Weights

Estimate Fisher Information for encoder and decoder weights.

Comparative Atlas Construction

Perform incremental update with regularization from (1) and (2).


Installation

1. Clone the Repository

git clone https://github.com/theislab/comparative_atlas
cd comparative_atlas

2. Setup a conda environment

Use the .yml file provided in the repo

conda env create -f environment.yml
conda activate cscanvi

Quick start

Here we provide an example on training a scANVI model incrementally.

Tip

You can download a simulated case–control PBMC scRNA-seq dataset—featuring increased IFN signaling in monocytes from case samples—along with the corresponding reference model from this link.

Import the modified SCANVI class from source

from cscanvi._scanvi import SCANVI

Construct a Replay Buffer by computing the Bregman Information metric for each cell. Here we select 20% of cells from the reference model , ref_model. The gene expression counts from the atlas are stored in adata. We compute BI by generating 200 augmentations to score each, then choose cells following the step approach.

import scvi
ref_model = scvi.model.SCANVI.load(ref_model_path, adata)

prop_cells_to_replay = 0.2
num_points_bi = int(adata.n_obs * prop_cells_to_replay)

N=200

unc_scores, score_idx  = SCANVI.get_uncertainty(adata, 
                                                ref_model, 
                                                order='step', 
                                                num_points = num_points_bi,
                                                tta_rep = N)

adata_healthyRef_replay = adata[score_idx.detach().cpu().numpy()]

Next we compute Fisher Information to estimate parameter importance. To compute Fisher Information, we first need to select a subset of control cells from the query:

# select a small proportion of query control cells for computing Fisher Information 
healthy_controls = (query_adata.obs.condition.isin(['control']))
adata_queryCtrl = sc.pp.subsample(query_adata[healthy_controls].copy(), 0.5, copy = True)

# concatenate reply buffer with query data
query_adata = query_adata.concatenate(adata_healthyRef_replay)

# add the query-control subset and replay buffer to uns. 
query_adata.uns['ctrl_query'] = adata_queryCtrl
query_adata.uns['replay_adata'] = adata_healthyRef_replay

# compute importance weights
query_model = SCANVI.load_query_data_with_replay(query_adata, 
                                                 reference_model = ref_model_path,
                                                 unfrozen=True,
                                                 control_uns_key = 'ctrl_query',
                                                 replay_uns_key = 'replay_adata'
                                                )

Set the desired value for ewc_importance (regularization strenght) and train:

contl_epochs = 150
train_kwargs_surgery = {
    "early_stopping": True,
    "early_stopping_monitor": "elbo_train",
    "early_stopping_patience": 10,
    "early_stopping_min_delta": 0.001,
    "plan_kwargs": {"ewc_importance": 0.1 ,"weight_decay": 0.0},
}


query_model.train(
    max_epochs=contl_epochs,
    **train_kwargs_surgery, 
)

Important

A very strong regularization can terminate training pre-maturely.

Reproducibility

The scANVI models of the comparative CRC all-lineage, Epithelial lineage and NK-T cell lineage integrations, and the notebooks to reproduce the figures from the manuscript will be released progressively.

Note

Integrated objects and the associated metadata is now available on HuggingFace.

Citation

If you use this project, please cite:

@article{hediyeh2026perturbation,
  title={Perturbation-guided mapping of colorectal cancer cell states to causal mechanisms},
  author={Hediyeh-zadeh, Soroor and Toh, Tzen S and Dufva, Olli and Serra, Giuseppe and Jackmola, Rashika and Fourneaux, Camille and Pinto, Goncalo and Fang, Zijian and Picco, Gabriele and Oliver, Amanda J and others},
  journal={bioRxiv},
  pages={2026--03},
  year={2026},
  publisher={Cold Spring Harbor Laboratory}
}

About

Incremental Atlas Expansion with case-control scRNAseq data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages