|
| 1 | +# Datasets |
1 | 2 |
|
2 | | -- <a href="#common-datasets" id="toc-common-datasets">Common datasets</a> |
3 | | - - <a href="#pipeline-topology" id="toc-pipeline-topology">Pipeline |
4 | | - topology</a> |
5 | | - - <a href="#file-format-api" id="toc-file-format-api">File format API</a> |
6 | | - - <a href="#datasetpcahvg" |
7 | | - id="toc-datasetpcahvg"><code>Dataset+Pca+Hvg</code></a> |
8 | | - - <a href="#normalized-dataset" |
9 | | - id="toc-normalized-dataset"><code>Normalized Dataset</code></a> |
10 | | - - <a href="#datasetpca" id="toc-datasetpca"><code>Dataset+Pca</code></a> |
11 | | - - <a href="#raw-dataset" id="toc-raw-dataset"><code>Raw Dataset</code></a> |
12 | | - - <a href="#component-api" id="toc-component-api">Component API</a> |
13 | | - - <a href="#dataset-loader" |
14 | | - id="toc-dataset-loader"><code>Dataset Loader</code></a> |
15 | | - - <a href="#normalization" |
16 | | - id="toc-normalization"><code>Normalization</code></a> |
17 | | - - <a href="#processor-hvg" |
18 | | - id="toc-processor-hvg"><code>Processor Hvg</code></a> |
19 | | - - <a href="#processor-pca" |
20 | | - id="toc-processor-pca"><code>Processor Pca</code></a> |
21 | | - |
22 | | -# Common datasets |
23 | | - |
24 | | -## Pipeline topology |
25 | | - |
26 | | -``` mermaid |
27 | | -%%| column: screen-inset-shaded |
28 | | -flowchart LR |
29 | | - file_dataset(Dataset+Pca+Hvg) |
30 | | - file_normalized(Normalized Dataset) |
31 | | - file_pca(Dataset+Pca) |
32 | | - file_raw(Raw Dataset) |
33 | | - comp_dataset_loader[/Dataset Loader/] |
34 | | - comp_normalization[/Normalization/] |
35 | | - comp_processor_hvg[/Processor Hvg/] |
36 | | - comp_processor_pca[/Processor Pca/] |
37 | | - file_raw---comp_normalization |
38 | | - file_pca---comp_processor_hvg |
39 | | - file_normalized---comp_processor_pca |
40 | | - comp_dataset_loader-->file_raw |
41 | | - comp_normalization-->file_normalized |
42 | | - comp_processor_hvg-->file_dataset |
43 | | - comp_processor_pca-->file_pca |
44 | | -``` |
45 | | - |
46 | | -## File format API |
47 | | - |
48 | | -### `Dataset+Pca+Hvg` |
49 | | - |
50 | | -A normalised data with a PCA embedding and HVG selection |
51 | | - |
52 | | -Used in: |
53 | | - |
54 | | -- [processor hvg](#processor%20hvg): output (as output) |
55 | | - |
56 | | -Slots: |
57 | | - |
58 | | -| struct | name | type | description | |
59 | | -|:-------|:-----------------|:--------|:------------------------------------------------------------------------| |
60 | | -| layers | counts | integer | Raw counts | |
61 | | -| layers | normalized | double | Normalised expression values | |
62 | | -| obs | celltype | string | Cell type information | |
63 | | -| obs | batch | string | Batch information | |
64 | | -| obs | tissue | string | Tissue information | |
65 | | -| obs | size_factors | double | The size factors created by the normalisation method, if any. | |
66 | | -| var | hvg | boolean | Whether or not the feature is considered to be a ‘highly variable gene’ | |
67 | | -| var | hvg_score | integer | A ranking of the features by hvg. | |
68 | | -| obsm | X_pca | double | The resulting PCA embedding. | |
69 | | -| varm | pca_loadings | double | The PCA loadings matrix. | |
70 | | -| uns | dataset_id | string | A unique identifier for the dataset | |
71 | | -| uns | normalization_id | string | Which normalization was used | |
72 | | -| uns | pca_variance | double | The PCA variance objects. | |
73 | | - |
74 | | -Example: |
75 | | - |
76 | | - AnnData object |
77 | | - obs: 'celltype', 'batch', 'tissue', 'size_factors' |
78 | | - var: 'hvg', 'hvg_score' |
79 | | - uns: 'dataset_id', 'normalization_id', 'pca_variance' |
80 | | - obsm: 'X_pca' |
81 | | - varm: 'pca_loadings' |
82 | | - layers: 'counts', 'normalized' |
83 | | - |
84 | | -### `Normalized Dataset` |
85 | | - |
86 | | -A normalized dataset |
87 | | - |
88 | | -Used in: |
89 | | - |
90 | | -- [normalization](#normalization): output (as output) |
91 | | -- [processor pca](#processor%20pca): input (as input) |
92 | | - |
93 | | -Slots: |
94 | | - |
95 | | -| struct | name | type | description | |
96 | | -|:-------|:-----------------|:--------|:--------------------------------------------------------------| |
97 | | -| layers | counts | integer | Raw counts | |
98 | | -| layers | normalized | double | Normalised expression values | |
99 | | -| obs | celltype | string | Cell type information | |
100 | | -| obs | batch | string | Batch information | |
101 | | -| obs | tissue | string | Tissue information | |
102 | | -| obs | size_factors | double | The size factors created by the normalisation method, if any. | |
103 | | -| uns | dataset_id | string | A unique identifier for the dataset | |
104 | | -| uns | normalization_id | string | Which normalization was used | |
105 | | - |
106 | | -Example: |
107 | | - |
108 | | - AnnData object |
109 | | - obs: 'celltype', 'batch', 'tissue', 'size_factors' |
110 | | - uns: 'dataset_id', 'normalization_id' |
111 | | - layers: 'counts', 'normalized' |
112 | | - |
113 | | -### `Dataset+Pca` |
114 | | - |
115 | | -A normalised data with a PCA embedding |
116 | | - |
117 | | -Used in: |
118 | | - |
119 | | -- [processor hvg](#processor%20hvg): input (as input) |
120 | | -- [processor pca](#processor%20pca): output (as output) |
121 | | - |
122 | | -Slots: |
123 | | - |
124 | | -| struct | name | type | description | |
125 | | -|:-------|:-----------------|:--------|:--------------------------------------------------------------| |
126 | | -| layers | counts | integer | Raw counts | |
127 | | -| layers | normalized | double | Normalised expression values | |
128 | | -| obs | celltype | string | Cell type information | |
129 | | -| obs | batch | string | Batch information | |
130 | | -| obs | tissue | string | Tissue information | |
131 | | -| obs | size_factors | double | The size factors created by the normalisation method, if any. | |
132 | | -| obsm | X_pca | double | The resulting PCA embedding. | |
133 | | -| varm | pca_loadings | double | The PCA loadings matrix. | |
134 | | -| uns | dataset_id | string | A unique identifier for the dataset | |
135 | | -| uns | normalization_id | string | Which normalization was used | |
136 | | -| uns | pca_variance | double | The PCA variance objects. | |
137 | | - |
138 | | -Example: |
139 | | - |
140 | | - AnnData object |
141 | | - obs: 'celltype', 'batch', 'tissue', 'size_factors' |
142 | | - uns: 'dataset_id', 'normalization_id', 'pca_variance' |
143 | | - obsm: 'X_pca' |
144 | | - varm: 'pca_loadings' |
145 | | - layers: 'counts', 'normalized' |
146 | | - |
147 | | -### `Raw Dataset` |
148 | | - |
149 | | -An unprocessed dataset as output by a dataset loader. |
150 | | - |
151 | | -Used in: |
152 | | - |
153 | | -- [dataset loader](#dataset%20loader): output (as output) |
154 | | -- [normalization](#normalization): input (as input) |
155 | | - |
156 | | -Slots: |
157 | | - |
158 | | -| struct | name | type | description | |
159 | | -|:-------|:-----------|:--------|:------------------------------------| |
160 | | -| layers | counts | integer | Raw counts | |
161 | | -| obs | celltype | string | Cell type information | |
162 | | -| obs | batch | string | Batch information | |
163 | | -| obs | tissue | string | Tissue information | |
164 | | -| uns | dataset_id | string | A unique identifier for the dataset | |
165 | | - |
166 | | -Example: |
167 | | - |
168 | | - AnnData object |
169 | | - obs: 'celltype', 'batch', 'tissue' |
170 | | - uns: 'dataset_id' |
171 | | - layers: 'counts' |
172 | | - |
173 | | -## Component API |
174 | | - |
175 | | -### `Dataset Loader` |
176 | | - |
177 | | -Arguments: |
178 | | - |
179 | | -| Name | Type | Direction | Description | |
180 | | -|:-----------|:------------------------------|:----------|:------------------------------------------------------| |
181 | | -| `--output` | [Raw Dataset](#Raw%20dataset) | output | An unprocessed dataset as output by a dataset loader. | |
182 | | - |
183 | | -### `Normalization` |
184 | | - |
185 | | -Arguments: |
186 | | - |
187 | | -| Name | Type | Direction | Description | |
188 | | -|:---------------------|:--------------------------------------------|:----------|:-------------------------------------------------------------| |
189 | | -| `--input` | [Raw Dataset](#Raw%20dataset) | input | An unprocessed dataset as output by a dataset loader. | |
190 | | -| `--output` | [Normalized Dataset](#Normalized%20dataset) | output | A normalized dataset | |
191 | | -| `--layer_output` | `string` | input | The name of the layer in which to store the normalized data. | |
192 | | -| `--obs_size_factors` | `string` | input | In which .obs slot to store the size factors (if any). | |
193 | | - |
194 | | -### `Processor Hvg` |
195 | | - |
196 | | -Arguments: |
197 | | - |
198 | | -| Name | Type | Direction | Description | |
199 | | -|:------------------|:------------------------------------|:----------|:---------------------------------------------------------------------------| |
200 | | -| `--input` | [Dataset+Pca](#Dataset+PCA) | input | A normalised data with a PCA embedding | |
201 | | -| `--layer_input` | `string` | input | Which layer to use as input for the PCA. | |
202 | | -| `--output` | [Dataset+Pca+Hvg](#Dataset+PCA+HVG) | output | A normalised data with a PCA embedding and HVG selection | |
203 | | -| `--var_hvg` | `string` | input | In which .var slot to store whether a feature is considered to be hvg. | |
204 | | -| `--var_hvg_score` | `string` | input | In which .var slot to store whether a ranking of the features by variance. | |
205 | | -| `--num_features` | `integer` | input | The number of HVG to select | |
206 | | - |
207 | | -### `Processor Pca` |
208 | | - |
209 | | -Arguments: |
210 | | - |
211 | | -| Name | Type | Direction | Description | |
212 | | -|:-------------------|:--------------------------------------------|:----------|:---------------------------------------------------------------------------------------------------------------------| |
213 | | -| `--input` | [Normalized Dataset](#Normalized%20dataset) | input | A normalized dataset | |
214 | | -| `--layer_input` | `string` | input | Which layer to use as input for the PCA. | |
215 | | -| `--output` | [Dataset+Pca](#Dataset+PCA) | output | A normalised data with a PCA embedding | |
216 | | -| `--obsm_embedding` | `string` | input | In which .obsm slot to store the resulting embedding. | |
217 | | -| `--varm_loadings` | `string` | input | In which .varm slot to store the resulting loadings matrix. | |
218 | | -| `--uns_variance` | `string` | input | In which .uns slot to store the resulting variance objects. | |
219 | | -| `--num_components` | `integer` | input | Number of principal components to compute. Defaults to 50, or 1 - minimum dimension size of selected representation. | |
| 3 | +# This directory has been moved to [https://github.com/openproblems-bio/datasets](https://github.com/openproblems-bio/datasets)! |
0 commit comments