This repository contains source code for ICePop paper (link).
The data used in this study are available on Zenodo
The code used to reproduce the analyses in the paper is available at: https://github.com/krishnanlab/icepop_analysis
python>=3.11,<3.12
ICePop requires torch==2.1.1. To enable GPU acceleration, which substantially speeds up metacell reconstruction, we recommend installing PyTorch following the official instructions on the PyTorch website to ensure compatibility with your system. Specifically, we used torch-2.1.1+cu121 in the paper.
After installing torch, then install ICePop via pip: pip install git+https://github.com/krishnanlab/icepop_public
Before running the analysis, we recommend downloading the processed data from Zenodo.
Expand and place the downloaded files under ../data, then run the following commands.
A more detailed tutorial is available at notebook/ICePop_tutorial.ipynb
icepop metacell \
--h5ad ../data/TM_FACS/TM_FACS_cnt.h5ad \
--outdir ../results/TM_FACS \
--save_name TM_FACS
--h5ad(str) Path to input AnnData (.h5ad) file containing single-cell expression count data--outdir(str) Output directory where MetaQ results will be written--save_name(str; default='metaq_res') prefix of metaq output under./save/*, do not write a path--ncell_per_mc(int; default=75) Target number of cells per metacell. The total number of metacells is \n determined as approximatelyn_cells / ncell_per_mc--ct_key(str; default='cell_type') Column name inadata.obsspecifying cell-type annotations. Used to evaluate metacell purity--device(str; default='cuda') Compute device to use. Options include 'cuda' or 'cpu'
this step need gpu for faster speed
- metacell assignment:
outdir/mc_assign.csv - metacell statistics:
outdir/mc_stats.csv
icepop association \
--h5ad ../data/TM_FACS/TM_FACS_cnt.h5ad \
--mc_assign ../results/TM_FACS/mc_assign.csv \
--magmaz ../data/magmaz/asd.genes.out \
--sp mmusculus \
--outdir ../results/TM_FACS
--h5ad(str) Input AnnData file containing single-cell expression count data--mc_assign(str) CSV file mapping cells to metacell assignments (output from step 1:outdir/mc_assign.csv)--magmaz(str) magmaz MAGMA gene-level association file (*.genes.out) of a trait of interest--spec_score(str; default=None) Precomputed specificity scores; will be calculated if not provided--outdir(str) Output directory for association results--n_jobs(int; default=20) Number of parallel workers--sp(str; default='mmusculus') Species identifier for gene ID conversion--ct_key(str; default='cell_type') Column inadata.obsdefining cell types--trait_name(str; optional) Trait name used for output file naming--n_perm(int; default=1000) Number of permutations for null distribution estimation--q_thres(float; default=0.1) FDR threshold for significance--output_dfbs(boolean; default=True) If output influential testing results
outdir/celltype__trait-*.csv: Disease-cell type association tableoutdir/dfbs__trait-*.npz: Gene-level influence scores (DFBETAS) for each disease–cell type associationoutdir/metacell__trait-*.csv: Disease-metacell type association tableoutdir/mc_spec_score.npz: (if not provided as part of run arguments)outdir/mcfdr__trait-*.csv: Cell type × metacell matrix indicating significant disease-associated metacells within each cell type
where * is trait name we assume magmaz file name is *.genes.out
# run all gene sets
icepop interactive \
--outdir ../results/TM_FACS \
--geneset_collections All \
--adata_path ../data/TM_FACS/TM_FACS_cnt.h5ad
or
# run specific gene sets
icepop interactive \
--outdir ../results/TM_FACS \
--geneset_collections BIOCARTA \
--adata_path ../data/TM_FACS/TM_FACS_cnt.h5ad
or
# custom gene sets
icepop interactive \
--outdir ../results/TM_FACS \
--geneset_collections none \
--geneset_path custom.gmt \
--adata_path ../data/TM_FACS/TM_FACS_cnt.h5ad
--outdir(str) Output directory for association results and metacell results--geneset_collections(str) All, 'BIOCARTA', 'KEGG', 'REACTOME', 'WIKIPATHWAYS', 'MIR', 'TF', 'GOBP', 'GOCC', 'GOMF', 'HP'--geneset_path(str) path to custom gmt file if--geneset_collectionsis set tonone--adata_path(str) path to AnnData file containing single-cell expression count data
outdir/icepop-report.ipynb: Interactive Jupyter notebook containing all resultsoutdir/icepop-report.html: Rendered HTML version of the notebook for easy viewingoutdir/enrichment: Directory containing gene set enrichment analysis results
Note: We recommend using the same
--outdirfor Step 1 and Step 2, as the interactive step expects results from both steps to be located in the same output directory.