Skip to content

Commit 59e6c0c

Browse files
giovpalmaangithub-actions[bot]almaanhiraksarkar
authored
[new task] Spatial decomposition (openproblems-bio#309)
* init spatial * spatial decomposition * init cleanup * run precommit * README update * updated readme * api update * pre-commit * readme update * api stylefix * pre-commit * metrics update * linting * pre-commit * black * linting fix * pre-commit * cleanup * cleanup * linting * linting * task name change * Rctd (#6) Co-authored-by: almaan <almaan@kth.se> * add stereoscope - nnls - nusvr - vanillanmf - nmfreg (#4) Co-authored-by: Hirak Sarkar <hiraksarkar.cs@gmail.com> * Seurat (#8) Co-authored-by: almaan <almaan@kth.se> * adding simulation (#7) Co-authored-by: giovp <giov.pll@gmail.com> Co-authored-by: almaan <almaan@kth.se> * reorder requirements * Update mse.py * update R2 description * review comments, populated __init__.py files for import * update import statements * fix label dataset * Specify image * fix random * fix labels * pre-commit * add test=False * add synth data from destVI * remove logger * return spatial reference in correct format * specify cell type label * add destVI simulation to datasets * fix random * fix nusvr * fix stereoscope * added destvi * added destvi * try fix data generation * fix from previous delete * add scvitools version * Synthetic data generation (#9) * tangram first * tangram first * tangram first * tangram first * flake8 + isort _destvi_utils * tangram update; pancreas add string index * tangram update; pancreas add string index * tangram update; pancreas add string index; n_obs = 1000 in synth data * tangram update; pancreas add string index; n_obs = 1000 in synth data * new synth * add tangram-sc to docker * new synth approach * new synth approach * new synth approach * new synth approach * new synth approach * new synth approach * new synth approach * new synth approach * pancreas subset integer; comment pancreas dataset [skip actions] * pancreas subset integer; comment pancreas dataset [skip actions] * pancreas subset integer; comment pancreas dataset [skip actions] * pancreas subset integer; comment pancreas dataset [skip actions] * pancreas subset integer; comment pancreas dataset [skip actions] * merge and split sc and st data * merged anndata in methods * merged anndata in methods * fix destvi * add code reference * shorten * Update openproblems/tasks/spatial_decomposition/_utils.py Co-authored-by: Giovanni Palla <giov.pll@gmail.com> * Update openproblems/tasks/spatial_decomposition/datasets/_sc_to_sp_utils.py [skip actions] Co-authored-by: Giovanni Palla <giov.pll@gmail.com> * Update openproblems/tasks/spatial_decomposition/datasets/_sc_to_sp_utils.py [skip actions] Co-authored-by: Giovanni Palla <giov.pll@gmail.com> * Update openproblems/tasks/spatial_decomposition/datasets/_sc_to_sp_utils.py [skip actions] Co-authored-by: Giovanni Palla <giov.pll@gmail.com> * comment fix * comment fix * fix pancreas dataset * update readme * fix destvi genertaion * fix sparse * minor fix * drop csr_matrix; fix double merge of anndata; update seurat v3 * updates * fix test for sparse arrays * test=False * add geos to r-extras * geos before r install * add software-properties-common * add python-software-properties * add RUN before command * rm geos from r-base * fix merging of anndata by pinning higher version * revert back anndata * fix obs_names and pin anndata * try to add swap * reduce number of spatial spots * remove swap * reduce obs * remove swap * remove step in CI * decrease dataset size * remove sparse * remove copy * remove datasets * remove datasets from init * address scott comments * skip all pancreas * fix import * remove destiv * Merge `main` into `synthetic-data-generation` (#10) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: MalteDLuecken <m.d.luecken@gmail.com> Co-authored-by: Scott Gigante <84813314+scottgigante-immunai@users.noreply.github.com> Co-authored-by: SingleCellOpenProblems <singlecellopenproblems@protonmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Daniel Strobl <50872326+danielStrobl@users.noreply.github.com> * change test * fix from_cache * pre-commit * update data generation to remove inf * change test * check task * resolve suggestions from scott Co-authored-by: almaan <almaan@kth.se> Co-authored-by: Giovanni Palla <giov.pll@gmail.com> Co-authored-by: Scott Gigante <84813314+scottgigante-immunai@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: MalteDLuecken <m.d.luecken@gmail.com> Co-authored-by: SingleCellOpenProblems <singlecellopenproblems@protonmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Daniel Strobl <50872326+danielStrobl@users.noreply.github.com> * Remove duplicate line * Fill in baseline decorator * Check R version of seurat * pre-commit * Remove reference to __from_cache__ * Clean up proportions assert * pre-commit * set merge='unique' to retain uns * fixes from scott comments * Fix code_version for new API * Set `max_epochs` on `test` * pre-commit * Temporarily remove destvi * Add dataset metadata fields * Add task summary * Temporarily remove steroscope * pre-commit * Fix typo * Copy `uns` * fix uns_merge to include _from_cache * convert NaNs in categorical dtypes * convert string dtypes * bump tangram-sc * fix string dtypes * convert strings to categoricals inside pancreas * change api label to str * obsm cannot be pd.DataFrame * revert anndata change * fix rctd * fix R2 * fix lots of things * fix nmfreg and sample method * address scott comments * add metadata attribute decorator * Update r_requirements.txt * Handle comments in `r_requirements.txt` * Rename spacexr * Move API below metrics * Rename NNLS * Fix RCTD code URL * Set n_pcs in RCTD python call * Revert 2077c35 * Set n_pca in seuratv3.py * use `n_pcs` in seuratv3.R * Split string rather than skipping QA * Shorten line lengths * shorten line lengths * Clean up comment * Delete pbmc3k_raw.h5ad * Rename R2.py to r2.py * Fix reference to r2.py * pre-commit * Rename sc_to_sp.py to pancreas.py * Rename _sc_and_sp_utils.py to utils.py * rename _utils.py to utils.py * pre-commit * import all pancreas datasets * fix typo * fix namespace clash * need to pass test arg * fix method name (0_1 -> 0_5) * check tower auth explicitly * filter genes and cells * filter_genes_cells is in-place * remaining todos from scott * add destvi dataset * delete scvi models and dataset * fix shell string * one more syntactic fix * Add tangram to readme * Specify cell types in description * pre-commit * Better dataset descriptors * Clean up * Split don't skip * handle random_state * Fix doi URL * Move import inside * Shorten line lengths * Remove commented imports * Shorten descriptors * Fix seuratv3 URL * Remove unused projection_type arg * Remove unused toarray * Remove unused toarray * Update vanillanmf.py * Remove unused DataFrame handler * Remove unused categorical handler * Remove unused pandas import * update nmfreg * fix nmfreg * fix vanilla * fix nmf * fix alpha * rctd * pre-commit * add dataset_reference * shorten line lengths * document PYTEST_MAX_RETRIES * Allow 429 too many requests Co-authored-by: almaan <almaan@kth.se> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Alma Andersson <kangarooblood@gmail.com> Co-authored-by: Hirak Sarkar <hiraksarkar.cs@gmail.com> Co-authored-by: Daniel Burkhardt <burkhardt.d.b@gmail.com> Co-authored-by: Scott Gigante <84813314+scottgigante-immunai@users.noreply.github.com> Co-authored-by: MalteDLuecken <m.d.luecken@gmail.com> Co-authored-by: SingleCellOpenProblems <singlecellopenproblems@protonmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Daniel Strobl <50872326+danielStrobl@users.noreply.github.com> Co-authored-by: Scott Gigante <scott.gigante@immunai.com>
1 parent d20f2fd commit 59e6c0c

28 files changed

Lines changed: 842 additions & 4 deletions

File tree

.github/workflows/run_tests.yml

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -240,6 +240,28 @@ jobs:
240240
sudo apt-get update -qq || (sudo rm /etc/apt/sources.list.d/* && sudo apt-get update -yqq)
241241
sudo apt-get install -qy --no-install-recommends libhdf5-dev pandoc gfortran libblas-dev liblapack-dev libedit-dev llvm-dev
242242
243+
- name: Check Tower authentication
244+
env:
245+
TOWER_ACCESS_TOKEN: ${{ secrets.TOWER_ACCESS_KEY }}
246+
TOWER_ORG_ID: 9180144064527
247+
TOWER_WORKSPACE_ID: 53907369739130
248+
run: |
249+
result=$(
250+
curl -X GET "https://api.tower.nf/orgs/${TOWER_ORG_ID}/workspaces/${TOWER_WORKSPACE_ID}" \
251+
-H "Accept: application/json" \
252+
-H "Authorization: Bearer ${TOWER_ACCESS_TOKEN}" || \
253+
(
254+
echo "Tower authentication failed; set TOWER_ACCESS_KEY in repo secrets." >&2 &&
255+
echo "See CONTRIBUTING.md for details." >&2 && \
256+
exit 1
257+
)
258+
)
259+
if [ -z "${result}" ]; then
260+
echo "Tower authentication failed; email singlecellopenproblems@protonmail.com with a request to be added to the openproblems-bio workspace"
261+
exit 1
262+
fi
263+
shell: /bin/bash -e {0}
264+
243265
- name: Set up Python
244266
uses: actions/setup-python@v4
245267
with:

CONTRIBUTING.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -381,10 +381,10 @@ pip install --editable .[test,r]
381381
pytest -v
382382
```
383383
384-
You may run specific tests with
384+
You may run specific tests quickly with
385385
386386
```shell
387-
pytest -k my_task
387+
PYTEST_MAX_RETRIES=0 pytest -k my_task
388388
```
389389
390390
The test suite also requires Python>=3.7, R>=4.0, and Docker to be installed.

docker/openproblems-python-extras/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,9 @@ Python packages:
1515
* molecular-cross-validation
1616
* MulticoreTSNE
1717
* phate
18+
* scvi-tools
1819
* neuralee
20+
* tangram-sc
1921
* torch
2022
* tensorflow-cpu
2123
* ivis

docker/openproblems-python-extras/requirements.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@ MulticoreTSNE==0.1
77
phate==1.0.*
88
pybedtools==0.9.*
99
pyensembl==2.0.*
10+
scvi-tools==0.16.*
11+
tangram-sc==1.0.*
1012
tensorflow-cpu==2.7.*
1113
torch==1.11.*
1214
xgboost==1.6.*

docker/openproblems-r-extras/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@ R packages:
1313
* batchelor
1414
* sparsesvd
1515
* dplyr
16+
* spacexr
17+
* Seurat
1618

1719
Python packages:
1820

docker/openproblems-r-extras/r_requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ conos@1.4.6
66
crayon@1.5.1
77
dbplyr@2.2.1
88
devtools@2.3.2
9+
dmcable/spacexr@6a593aa # master
910
dplyr@1.0.9
1011
e1071@1.7-11
1112
ellipsis@0.3.2

openproblems/data/pancreas.py

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
from . import utils
22

3+
import anndata as ad
4+
import numpy as np
35
import os
46
import scanpy as sc
57
import scprep
@@ -47,3 +49,24 @@ def load_pancreas(test=False):
4749
utils.filter_genes_cells(adata)
4850

4951
return adata
52+
53+
54+
def get_pancreas_integer(adata: ad.AnnData):
55+
"""Transform counts to integer.
56+
57+
For some platforms the pancreas data set only have processed counts.
58+
Here we grab those with integer counts.
59+
See https://github.com/theislab/scib-reproducibility/tree/main/notebooks/data_preprocessing/pancreas # noqa: E501
60+
"""
61+
is_int = ["smartseq2"]
62+
is_int += ["inDrop{}".format(x) for x in range(1, 5)]
63+
64+
keep = np.zeros(len(adata)).astype(bool)
65+
66+
for tech in is_int:
67+
idx = adata.obs.tech.values == tech
68+
keep = keep | idx
69+
70+
adata = adata[keep, :].copy()
71+
72+
return adata

openproblems/tasks/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,5 @@
33
from . import label_projection
44
from . import multimodal_data_integration
55
from . import regulatory_effect_prediction
6+
from . import spatial_decomposition
67
from ._batch_integration import batch_integration_graph
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Spatial Decomposition/Deconvolution
2+
3+
## The task
4+
5+
Spatial decomposition (also often referred to as Spatial deconvolution) is
6+
applicable to spatial transcriptomics data where the transcription profile of
7+
each capture location (spot, voxel, bead, etc.) do not share a bijective
8+
relationship with the cells in the tissue, i.e., multiple cells may contribute
9+
to the same capture location. The task of spatial decomposition then refers to
10+
estimating the composition of cell types/states that are present at each capture
11+
location. The cell type/states estimates are presented as proportion values,
12+
representing the proportion of the cells at each capture location that belong to
13+
a given cell type.
14+
15+
We distinguish between _reference-based_ decomposition and _de novo_
16+
decomposition, where the former leverage external data (e.g., scRNA-seq or
17+
scNuc-seq) to guide the inference process, while the latter only work with the
18+
spatial data. We require that all datasets have an associated reference single
19+
cell data set, but methods are free to ignore this information.
20+
21+
## Metrics
22+
23+
### R2
24+
25+
R2 pronounced as "R squared", also known as the "coefficient of determination". R2
26+
reports the fraction of the true proportion values' (`adata.obsm["proportions_true"]`)
27+
variance that can be explained by the predicted proportion values
28+
(`adata.obsm["proportion_pred"]`). The **best score**, and upper bound, is 1.0. There is
29+
no fixed lower bound for the metric. The _uniform/non-weighted average_ across all cell
30+
types/states is used to summarize performance. See the
31+
[sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html)
32+
documentation for details on the implementation and the
33+
[wikipedia](https://en.wikipedia.org/wiki/Coefficient_of_determination) site for more
34+
general information regarding the metric.
35+
36+
## API
37+
38+
Datasets consists of 2 `anndata.AnnData` objects, concatenated by key
39+
`adata.obs["modality"]` with values:
40+
41+
* `sc` for the single cell reference.
42+
* `sp` for the target spatial dataset.
43+
44+
In the single cell reference, cell-types are stored in `adata_sc.obs["label"]`.
45+
In the spatial target, ground-truth cell-type proportions are stored in
46+
`adata_spatial.obsm["proportions_true"]`.
47+
Methods should return only the spatial data with inferred proportions stored in
48+
`adata_spatial.obsm["proportions_pred"]`.
49+
Metrics shall compare `adata_spatial.obsm['proportions_pred']` to
50+
`adata_spatial.obsm['proportions_true']`.
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
from ...utils import get_callable_members
2+
from . import api
3+
from . import datasets
4+
from . import methods
5+
from . import metrics
6+
7+
_task_name = "Spatial Decomposition"
8+
_task_summary = (
9+
"Decompose spatial gene expression data by cell type from single cell reference"
10+
)
11+
12+
DATASETS = get_callable_members(datasets)
13+
METHODS = get_callable_members(methods)
14+
METRICS = get_callable_members(metrics)

0 commit comments

Comments
 (0)