Skip to content

Commit 215c0f9

Browse files
authored
Merge branch 'master' into extra_covariates
2 parents b99040c + 8154f60 commit 215c0f9

70 files changed

Lines changed: 5349 additions & 7880 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
---
2+
name: Bug report
3+
about: Cell2location doesn’t do what it should? Please help us fix it!
4+
title: ''
5+
labels: bug
6+
assignees: ''
7+
---
8+
9+
- [ ] I have confirmed this bug exists on the latest version of cell2location. See https://github.com/BayraktarLab/cell2location#installation
10+
- [ ] I follow the instructions from the [scvi-tools tutorial](https://cell2location.readthedocs.io/en/latest/notebooks/cell2location_tutorial.html).
11+
12+
---
13+
14+
**Note**: Please read [this guide](https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports) detailing how to provide the necessary information for us to reproduce your bug.
15+
16+
17+
### Minimal code sample (that we can run without your data, using public data)
18+
19+
```python
20+
# Your code here
21+
```
22+
23+
```pytb
24+
[Paste the error output produced by the above code here]
25+
```
26+

.github/ISSUE_TEMPLATE/config.yml

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
blank_issues_enabled: false
2+
contact_links:
3+
- name: scverse Discorse
4+
url: https://discourse.scverse.org/c/ecosytem/cell2location/
5+
about: Ask usage questions, how to solve your problems using cell2location and other scvi-tools packages.
6+
7+
- name: Frequently asked questions
8+
url: https://github.com/BayraktarLab/cell2location/issues?q=is%3Aissue+is%3Aopen+label%3AFAQ
9+
about: Before asking a question please check this list (issue with FAQ tag).
10+
11+
- name: cell2location Community Discussions [deprecated]
12+
url: https://discourse.scverse.org/c/ecosytem/cell2location/
13+
about: Find previous answers/issues. For new questions please use the link above.
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
---
2+
name: Enhancement request
3+
about: Anything you’d like to see in cell2location?
4+
title: ''
5+
labels: enhancement
6+
assignees: ''
7+
---
8+
9+
<!-- Please describe your feature you would like to see below: -->
10+
...

.github/ISSUE_TEMPLATE/question.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
---
2+
name: Usage Question
3+
about: Template for posting a question to scverse Discourse.
4+
title: ''
5+
labels: question
6+
assignees: ''
7+
---
8+
9+
## Please use the template below to post a question to https://discourse.scverse.org/c/ecosytem/cell2location/.
10+
11+
### Problem
12+
13+
<!-- Please describe your problem below: -->
14+
...
15+
16+
- [ ] I follow the instructions from the [cell2location tutorial (using on scvi-tools)](https://cell2location.readthedocs.io/en/latest/notebooks/cell2location_tutorial.html).
17+
- [ ] I have adjusted required hyperparameters to my dataset and tissue `N_cells_per_location` and `detection_alpha`.
18+
- [ ] I have provided 10X reaction/inlet as `batch_key` for reference NB regression.
19+
- [ ] I have checked [scverse Discourse](https://discourse.scverse.org/c/ecosytem/cell2location/) and [old Cell2location Community Forum](https://github.com/BayraktarLab/cell2location/discussions), and did not find a solution.
20+
21+
22+
### Description of the data input and hyperparameters
23+
24+
<!-- Please briefly describe your : -->
25+
...
26+
27+
<!-- Please briefly describe your spatial data: -->
28+
...
29+
30+
#### Single cell reference data: number of cells, number of cell types, number of genes
31+
32+
<!-- Please add this info: -->
33+
...
34+
35+
#### Single cell reference data: technology type (e.g. mix of 10X 3' and 5')
36+
37+
<!-- Please add this info: -->
38+
...
39+
40+
#### Spatial data: number of locations numbers, technology type (e.g. Visium, ISS, Nanostring WTA)
41+
42+
<!-- Please add this info: -->
43+
...

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
repos:
22
- repo: https://github.com/python/black
3-
rev: 20.8b1
3+
rev: '22.3.0'
44
hooks:
55
- id: black
66
- repo: https://gitlab.com/pycqa/flake8

README.md

Lines changed: 149 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,17 @@
55
### Comprehensive mapping of tissue cell architecture via integrated single cell and spatial transcriptomics (cell2location model)
66

77
[![Stars](https://img.shields.io/github/stars/BayraktarLab/cell2location?logo=GitHub&color=yellow)](https://github.com/BayraktarLab/cell2location/stargazers)
8-
![Build Status](https://github.com/BayraktarLab/cell2location/actions/workflows/test.yml/badge.svg)
8+
![Build Status](https://github.com/BayraktarLab/cell2location/actions/workflows/test.yml/badge.svg?event=push)
99
[![Documentation Status](https://readthedocs.org/projects/cell2location/badge/?version=latest)](https://cell2location.readthedocs.io/en/stable/?badge=latest)
1010
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/BayraktarLab/cell2location/blob/master/docs/notebooks/cell2location_tutorial.ipynb)
11-
[![Docker image on quay.io](https://img.shields.io/badge/container-quay.io/vitkl/cell2location-brightgreen "Docker image on quay.io")](https://quay.io/vitkl/cell2location)
11+
[![Docker image on quay.io](https://img.shields.io/badge/container-quay.io/vitkl/cell2location-brightgreen "Docker image on quay.io")](https://quay.io/vitkl/cell2location)
1212

13-
Preprint: https://www.biorxiv.org/content/10.1101/2020.11.15.378125v1
13+
If you use cell2location please cite our paper:
14+
15+
Kleshchevnikov, V., Shmatko, A., Dann, E. et al. Cell2location maps fine-grained cell types in spatial transcriptomics. Nat Biotechnol (2022). https://doi.org/10.1038/s41587-021-01139-4
16+
https://www.nature.com/articles/s41587-021-01139-4
17+
18+
Please note that cell2locations requires 2 user-provided hyperparameters (N_cells_per_location and detection_alpha) - for detailed guidance on setting these hyperparameters and their impact see [the flow diagram and the note](https://github.com/BayraktarLab/cell2location/blob/master/docs/images/Note_on_selecting_hyperparameters.pdf). Many real datasets (especially human) show within-slide variability in RNA detection sensitivity - requiring you to try both recommended settings of the `detection_alpha` parameter: `detection_alpha=200` for low within-slide technical variability and `detection_alpha=20` for high within-slide technical variability.
1419

1520
Cell2location is a principled Bayesian model that can resolve fine-grained cell types in spatial transcriptomic data and create comprehensive cellular maps of diverse tissues. Cell2location accounts for technical sources of variation and borrows statistical strength across locations, thereby enabling the integration of single cell and spatial transcriptomics with higher sensitivity and resolution than existing tools. This is achieved by estimating which combination of cell types in which cell abundance could have given the mRNA counts in the spatial data, while modelling technical effects (platform/technology effect, contaminating RNA, unexplained variance).
1621

@@ -21,11 +26,9 @@ Overview of the spatial mapping approach and the workflow enabled by cell2locati
2126

2227
## Usage and Tutorials
2328

24-
The tutorial covering the estimation of expresson signatures of reference cell types, spatial mapping with cell2location and the downstream analysis can be found here: https://cell2location.readthedocs.io/en/latest/
25-
26-
You can also try cell2location on [Google Colab](https://colab.research.google.com/github/BayraktarLab/cell2location/blob/master/docs/notebooks/cell2location_tutorial.ipynb) on a smaller data subset containing somatosensory cortex.
29+
The tutorial covering the estimation of expresson signatures of reference cell types, spatial mapping with cell2location and the downstream analysis can be found here and tried on [Google Colab](https://colab.research.google.com/github/BayraktarLab/cell2location/blob/master/docs/notebooks/cell2location_tutorial.ipynb): https://cell2location.readthedocs.io/en/latest/
2730

28-
Please report bugs via https://github.com/BayraktarLab/cell2location/issues and ask any usage questions in https://github.com/BayraktarLab/cell2location/discussions.
31+
Please report bugs via https://github.com/BayraktarLab/cell2location/issues and ask any usage questions about [cell2location](https://discourse.scverse.org/c/ecosytem/cell2location/42), [scvi-tools](https://discourse.scverse.org/c/help/scvi-tools/7) or [Visium data](https://discourse.scverse.org/c/general/visium/32) in scverse community discourse.
2932

3033
Cell2location package is implemented in a general way (using https://pyro.ai/ and https://scvi-tools.org/) to support multiple related models - both for spatial mapping, estimating reference cell type signatures and downstream analysis.
3134

@@ -58,10 +61,10 @@ bash Miniconda3-latest-Linux-x86_64.sh
5861
# use prefix /path/to/software/miniconda3
5962
```
6063

61-
Before installing cell2location and it's dependencies, it could be necessary to make sure that you are creating a fully isolated conda environment by telling python to NOT use user site for installing packages, ideally by adding this line to your `~/.bashrc` file , but this would also work during a terminal session:
64+
Before installing cell2location and it's dependencies, it could be necessary to make sure that you are creating a fully isolated conda environment by telling python to NOT use user site for installing packages by running this line before creating conda environment and every time before activatin conda environment in a new terminal session:
6265

6366
```bash
64-
export PYTHONNOUSERSITE="someletters"
67+
export PYTHONNOUSERSITE="literallyanyletters"
6568
```
6669

6770

@@ -76,12 +79,147 @@ Cell2location architecture is designed to simplify extended versions of the mode
7679
We thank all paper authors for their contributions:
7780
Vitalii Kleshchevnikov, Artem Shmatko, Emma Dann, Alexander Aivazidis, Hamish W King, Tong Li, Artem Lomakin, Veronika Kedlian, Mika Sarkin Jain, Jun Sung Park, Lauma Ramona, Liz Tuck, Anna Arutyunyan, Roser Vento-Tormo, Moritz Gerstung, Louisa James, Oliver Stegle, Omer Ali Bayraktar
7881

79-
We also thank Krzysztof Polanski, Luz Garcia Alonso, Carlos Talavera-Lopez, Ni Huang for feedback on the package, Martin Prete for dockerising cell2location and other software support.
82+
We also thank Pyro developers (Fritz Obermeyer, Martin Jankowiak), Krzysztof Polanski, Luz Garcia Alonso, Carlos Talavera-Lopez, Ni Huang for feedback on the package, Martin Prete for dockerising cell2location and other software support.
8083

8184
## FAQ
8285

8386
See https://github.com/BayraktarLab/cell2location/discussions
8487

8588
## Future development and experimental features
89+
Future developments of cell2location are focused on 1) scalability to 100k-mln+ locations using amortised inference of cell abundance (same ideas as used in VAE), 2) extending cell2location to related spatial analysis tasks that require modification of the model (such as using cell type hierarchy information), and 3) incorporating features presented by more recently proposed methods (such as CAR spatial proximity modelling). We are also experimenting with Numpyro and JAX (https://github.com/vitkl/cell2location_numpyro).
90+
91+
## Tips
92+
93+
### Conda environment for A100 GPUs
94+
95+
```bash
96+
export PYTHONNOUSERSITE="literallyanyletters"
97+
conda create -y -n test_scvi16_cuda113 python=3.9
98+
conda activate test_scvi16_cuda113
99+
conda install -y -c anaconda hdf5 pytables git
100+
pip install scvi-tools
101+
pip install git+https://github.com/BayraktarLab/cell2location.git#egg=cell2location[tutorials]
102+
pip3 install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 -f https://download.pytorch.org/whl/torch_stable.html
103+
conda activate test_scvi16_cuda113
104+
python -m ipykernel install --user --name=test_scvi16_cuda113 --display-name='Environment (test_scvi16_cuda113)'
105+
```
106+
107+
### Issues with package version mismatches often originate from python user site rather than conda environment being used to install a subset of packages
108+
109+
Before installing cell2location and it's dependencies, it could be necessary to make sure that you are creating a fully isolated conda environment by telling python to NOT use user site for installing packages by running this line before creating conda environment and every time before activatin conda environment in a new terminal session:
86110

87-
We also provide an experimental numpyro translation of the model which has improved memory efficiency (allowing analysis of multiple Visium samples on Google Colab) and minor improvements in speed - https://github.com/vitkl/cell2location_numpyro. You can try it on Google Colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vitkl/cell2location_numpyro/blob/main/docs/notebooks/cell2location_short_demo_colab.ipynb) - however note that both numpyro itself and cell2location_numpyro are in very active development.
111+
```bash
112+
export PYTHONNOUSERSITE="literallyanyletters"
113+
```
114+
115+
### Useful code for reading and combining multiple Visium sections
116+
117+
Keeping info on distinct sections in a csv file (Google Sheet).
118+
119+
```python
120+
sample_annot = pd.read_csv('./sample_annot.csv')
121+
122+
from glob import glob
123+
sample_annot['path'] = pd.Series(
124+
glob(f'{sp_data_folder}*'),
125+
index=[sub('^.+WTSI_', '', sub('_GRCh38-2020-A$', '', i)) for i in glob(f'{sp_data_folder}*')]
126+
)[sample_annot['Sample_ID']].values
127+
import os
128+
sample_annot['file'] = [os.path.basename(i) for i in sample_annot['path']]
129+
130+
sample_annot['Sample_ID'].unique()
131+
```
132+
133+
Reading and concatenating samples.
134+
135+
```python
136+
def read_and_qc(sample_name, file, path=sp_data_folder):
137+
"""
138+
Read one Visium file and add minimum metadata and QC metrics to adata.obs
139+
NOTE: var_names is ENSEMBL ID as it should be, you can always plot with sc.pl.scatter(gene_symbols='SYMBOL')
140+
"""
141+
142+
adata = sc.read_visium(path + str(file) +'/',
143+
count_file='filtered_feature_bc_matrix.h5',
144+
load_images=True)
145+
adata.obs['sample'] = sample_name
146+
adata.var['SYMBOL'] = adata.var_names
147+
adata.var.rename(columns={'gene_ids': 'ENSEMBL'}, inplace=True)
148+
adata.var_names = adata.var['ENSEMBL']
149+
adata.var.drop(columns='ENSEMBL', inplace=True)
150+
151+
# just in case there are non-unique ENSEMBL IDs
152+
adata.var_names_make_unique()
153+
154+
# Calculate QC metrics
155+
sc.pp.calculate_qc_metrics(adata, inplace=True)
156+
adata.var['mt'] = [gene.startswith('mt-') for gene in adata.var['SYMBOL']]
157+
adata.obs['mt_frac'] = adata[:, adata.var['mt'].tolist()].X.sum(1).A.squeeze()/adata.obs['total_counts']
158+
159+
# add sample name to obs names
160+
adata.obs["sample"] = [str(i) for i in adata.obs['sample']]
161+
adata.obs_names = 's' + adata.obs["sample"] \
162+
+ '_' + adata.obs_names
163+
adata.obs.index.name = 'spot_id'
164+
165+
file = list(adata.uns['spatial'].keys())[0]
166+
adata.uns['spatial'][sample_name] = adata.uns['spatial'][file].copy()
167+
del adata.uns['spatial'][file]
168+
print(adata.uns['spatial'].keys())
169+
170+
return adata
171+
172+
def read_all_and_qc(
173+
sample_annot, Sample_ID_col, file_col, sp_data_folder,
174+
count_file='filtered_feature_bc_matrix.h5',
175+
):
176+
"""
177+
Read and concatenate all Visium files.
178+
"""
179+
# read first sample
180+
adata = read_and_qc(
181+
sample_annot[Sample_ID_col][0], sample_annot[file_col][0],
182+
path=sp_data_folder
183+
)
184+
185+
# read the remaining samples
186+
slides = {}
187+
for i, s in enumerate(sample_annot[Sample_ID_col][1:]):
188+
adata_1 = read_and_qc(s, sample_annot[file_col][i], path=sp_data_folder)
189+
slides[str(s)] = adata_1
190+
191+
adata_0 = adata.copy()
192+
193+
# combine individual samples
194+
#adata = adata.concatenate(list(slides.values()), index_unique=None)
195+
adata = adata.concatenate(
196+
list(slides.values()),
197+
batch_key="sample",
198+
uns_merge="unique",
199+
batch_categories=sample_annot[Sample_ID_col],
200+
index_unique=None
201+
)
202+
203+
sample_annot.index = sample_annot[Sample_ID_col]
204+
for c in sample_annot.columns:
205+
sample_annot.loc[:, c] = sample_annot[c].astype(str)
206+
adata.obs[sample_annot.columns] = sample_annot.reindex(index=adata.obs['sample']).values
207+
208+
return adata
209+
210+
adata = read_all_and_qc(
211+
sample_annot=sample_annot,
212+
Sample_ID_col='Sample_ID',
213+
file_col='file',
214+
sp_data_folder=sp_data_folder,
215+
count_file='filtered_feature_bc_matrix.h5',
216+
)
217+
218+
adata_incl_nontissue = read_all_and_qc(
219+
sample_annot=sample_annot,
220+
Sample_ID_col='Sample_ID',
221+
file_col='file',
222+
sp_data_folder=sp_data_folder,
223+
count_file='raw_feature_bc_matrix.h5',
224+
)
225+
```

cell2location/__init__.py

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,11 @@
22
from pyro.distributions.transforms import SoftplusTransform
33
from torch.distributions import biject_to, transform_to
44

5-
from .run_c2l import run_cell2location
5+
from . import models
66
from .run_colocation import run_colocation
7-
from .run_regression import run_regression
87

98
__all__ = [
10-
"run_cell2location",
11-
"run_regression",
9+
"models",
1210
"run_colocation",
1311
]
1412

0 commit comments

Comments
 (0)