Incremental comparative Atlas Construction with Bregman-Regularized Replay

Incremental comparative Atlas Construction with Bregman-Regularized Replay

This repository implements an incremental comparative atlas construction framework using:

Bregman Information (BI) to build a replay buffer
Fisher Information–based importance weighting
Regularized incremental model updates

The workflow enables robust case–control integration without catastrophic forgetting.

Overview of the Method

The pipeline consists of four stages:

Input

Integrated reference atlas (multi-study healthy reference)
Case–control query data (multi-study)

(1) Compute Bregman Information → Create Replay Buffer

Select informative reference cells to preserve during continual training.

(2) Compute Importance Weights

Estimate Fisher Information for encoder and decoder weights.

Comparative Atlas Construction

Perform incremental update with regularization from (1) and (2).

Installation

1. Clone the Repository

git clone https://github.com/theislab/comparative_atlas
cd comparative_atlas

2. Setup a conda environment

Use the .yml file provided in the repo

conda env create -f environment.yml
conda activate cscanvi

Quick start

Here we provide an example on training a scANVI model incrementally.

Tip

You can download a simulated case–control PBMC scRNA-seq dataset—featuring increased IFN signaling in monocytes from case samples—along with the corresponding reference model from this link.

Import the modified SCANVI class from source

from cscanvi._scanvi import SCANVI

Construct a Replay Buffer by computing the Bregman Information metric for each cell. Here we select 20% of cells from the reference model , ref_model. The gene expression counts from the atlas are stored in adata. We compute BI by generating 200 augmentations to score each, then choose cells following the step approach.

import scvi
ref_model = scvi.model.SCANVI.load(ref_model_path, adata)

prop_cells_to_replay = 0.2
num_points_bi = int(adata.n_obs * prop_cells_to_replay)

N=200

unc_scores, score_idx  = SCANVI.get_uncertainty(adata, 
                                                ref_model, 
                                                order='step', 
                                                num_points = num_points_bi,
                                                tta_rep = N)

adata_healthyRef_replay = adata[score_idx.detach().cpu().numpy()]

Next we compute Fisher Information to estimate parameter importance. To compute Fisher Information, we first need to select a subset of control cells from the query:

# select a small proportion of query control cells for computing Fisher Information 
healthy_controls = (query_adata.obs.condition.isin(['control']))
adata_queryCtrl = sc.pp.subsample(query_adata[healthy_controls].copy(), 0.5, copy = True)

# concatenate reply buffer with query data
query_adata = query_adata.concatenate(adata_healthyRef_replay)

# add the query-control subset and replay buffer to uns. 
query_adata.uns['ctrl_query'] = adata_queryCtrl
query_adata.uns['replay_adata'] = adata_healthyRef_replay

# compute importance weights
query_model = SCANVI.load_query_data_with_replay(query_adata, 
                                                 reference_model = ref_model_path,
                                                 unfrozen=True,
                                                 control_uns_key = 'ctrl_query',
                                                 replay_uns_key = 'replay_adata'
                                                )

Set the desired value for ewc_importance (regularization strenght) and train:

contl_epochs = 150
train_kwargs_surgery = {
    "early_stopping": True,
    "early_stopping_monitor": "elbo_train",
    "early_stopping_patience": 10,
    "early_stopping_min_delta": 0.001,
    "plan_kwargs": {"ewc_importance": 0.1 ,"weight_decay": 0.0},
}


query_model.train(
    max_epochs=contl_epochs,
    **train_kwargs_surgery, 
)

Important

A very strong regularization can terminate training pre-maturely.

Reproducibility

The scANVI models of the comparative CRC all-lineage, Epithelial lineage and NK-T cell lineage integrations, and the notebooks to reproduce the figures from the manuscript will be released progressively.

Note

Integrated objects and the associated metadata is now available on HuggingFace.

Citation

If you use this project, please cite:

@article{hediyeh2026perturbation,
  title={Perturbation-guided mapping of colorectal cancer cell states to causal mechanisms},
  author={Hediyeh-zadeh, Soroor and Toh, Tzen S and Dufva, Olli and Serra, Giuseppe and Jackmola, Rashika and Fourneaux, Camille and Pinto, Goncalo and Fang, Zijian and Picco, Gabriele and Oliver, Amanda J and others},
  journal={bioRxiv},
  pages={2026--03},
  year={2026},
  publisher={Cold Spring Harbor Laboratory}
}

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
cscanvi		cscanvi
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Incremental comparative Atlas Construction with Bregman-Regularized Replay

Overview of the Method

Input

(1) Compute Bregman Information → Create Replay Buffer

(2) Compute Importance Weights

Comparative Atlas Construction

Installation

1. Clone the Repository

2. Setup a conda environment

Quick start

Reproducibility

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Incremental comparative Atlas Construction with Bregman-Regularized Replay

Overview of the Method

Input

(1) Compute Bregman Information → Create Replay Buffer

(2) Compute Importance Weights

Comparative Atlas Construction

Installation

1. Clone the Repository

2. Setup a conda environment

Quick start

Reproducibility

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages