Skip to content

Commit 3fee607

Browse files
authored
fix typo - installation
1 parent 876add3 commit 3fee607

1 file changed

Lines changed: 32 additions & 33 deletions

File tree

README.md

Lines changed: 32 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
</h1>
55

66
<p align="center">
7-
<strong>A GPU-accelerated tool for large scale scRNA-seq pipeline.</strong>
7+
<strong>A GPU-accelerated tool for large-scale scRNA-seq pipeline.</strong>
88
</p>
99

1010
<!-- <p align="center">
@@ -29,7 +29,7 @@
2929

3030
- Fast scRNA-seq pipeline including QC, Normalization, Batch-effect Removal, Dimension Reduction in a ***similar syntax*** as `scanpy` and `rapids-singlecell`.
3131
- Scale to dataset with more than ***10M cells*** on a ***single*** GPU. (A100 80G)
32-
- Chunk the data to avoid the ***`int32` limitation*** in `cupyx.scipy.sparse` used by `rapids-singlecell` that disables the computing for moderate-size dataset (~1.3M) without Multi-GPU support.
32+
- Chunk the data to avoid the ***`int32` limitation*** in `cupyx.scipy.sparse` used by `rapids-singlecell` that disables the computing for moderate-size datasets (~1.3M) without Multi-GPU support.
3333
- Reconcile output at each step to ***`scanpy`*** to reproduce the ***same*** results as on CPU end.
3434
- Improvement on ***`harmonypy`*** which allows dataset with more than ***10M cells*** and more than ***1000 samples*** to be run on a single GPU.
3535
- Speedup and optimize ***`NSForest`*** algorithm using GPU for ***better*** maker gene identification.
@@ -85,27 +85,26 @@
8585
Requirements:
8686
- [**RAPIDS**](https://rapids.ai/) from Nvidia
8787
- [**rapids-singlecell**](https://rapids-singlecell.readthedocs.io/en/latest/index.html), an alternative of *scanpy* that employs GPU for acceleration.
88-
- [**Conda**](https://docs.conda.io/projects/conda/en/latest/index.html), version >=22.11 is strongly encoruaged, because *conda-libmamba-solver* is set as default, which significant speeds up solving dependencies.
88+
- [**Conda**](https://docs.conda.io/projects/conda/en/latest/index.html), version >=22.11 is strongly encouraged, because *conda-libmamba-solver* is set as default, which significantly speeds up solving dependencies.
8989
- [**pip**](), a python package installer.
9090

9191
Environment Setup:
9292
1. Install [**RAPIDS**](https://rapids.ai/) through Conda, \
93-
`conda create -n scalesc -c rapidsai -c conda-forge -c nvidia \
94-
rapids=25.02 python=3.12 'cuda-version>=12.0,<=12.8`
95-
Users have flexibility to install it according to their systems by using this [online selector](https://docs.rapids.ai/install/?_gl=1*1em94gj*_ga*OTg5MDQyNDkyLjE3MjM0OTAyNjk.*_ga_RKXFW6CM42*MTczMDIxNzIzOS4yLjAuMTczMDIxNzIzOS42MC4wLjA.#selector). We highly recommand to install `**RAPIDS**>=24.12`, it solves a bug related to the leiden algorithm which results in too many clusters.
93+
`conda create -n scalesc -c rapidsai -c conda-forge -c nvidia rapids=25.02 python=3.12 'cuda-version>=12.0,<=12.8'`
94+
Users have the flexibility to install it according to their systems by using this [online selector](https://docs.rapids.ai/install/?_gl=1*1em94gj*_ga*OTg5MDQyNDkyLjE3MjM0OTAyNjk.*_ga_RKXFW6CM42*MTczMDIxNzIzOS4yLjAuMTczMDIxNzIzOS42MC4wLjA.#selector). We highly recommend installing `**RAPIDS**>=24.12`, it solves a bug related to the Leiden algorithm, which results in too many clusters.
9695

9796
2. Activate conda env, \
9897
`conda activate scalesc`
9998
3. Install [**rapids-singlecell**](https://rapids-singlecell.readthedocs.io/en/latest/index.html) using pip, \
10099
`pip install rapids-singlecell`
101100

102101
4. Install scaleSC,
103-
- pull scaleSC from github \
102+
- Pull scaleSC from github \
104103
`git clone https://github.com/interactivereport/scaleSC.git`
105-
- enter the folder and install scaleSC \
104+
- Enter the folder and install scaleSC \
106105
`cd scaleSC` \
107106
`pip install .`
108-
5. check env:
107+
5. Check env:
109108
- `python -c "import scalesc; print(scalesc.__version__)"` == 0.1.0
110109
- `python -c "import cupy; print(cupy.__version__)"` >= 13.3.0
111110
- `python -c "import cuml; print(cuml.__version__)"` >= 24.10
@@ -125,9 +124,9 @@ Please cite [ScaleSC](https://doi.org/10.1101/2025.01.28.635256), and [Scanpy](h
125124

126125
## Updates:
127126
- 2/26/2025:
128-
- adding a parameter `threshold` in function `adata_cluster_merge` to support cluster merging at various scales according to user's specification. `threshold` is between 0 and 1. set to 0 by default.
129-
- updating a few more examples of cluster merging in the tutorial.
130-
- future work: adding supports for loading from large `.h5ad` files.
127+
- adding a parameter `threshold` in function `adata_cluster_merge` to support cluster merging at various scales according to the user's specification. `threshold` is between 0 and 1. Set to 0 by default.
128+
- Updating a few more examples of cluster merging in the tutorial.
129+
- future work: adding support for loading from large `.h5ad` files.
131130

132131

133132

@@ -144,7 +143,7 @@ Please cite [ScaleSC](https://doi.org/10.1101/2025.01.28.635256), and [Scanpy](h
144143
## <kbd>class</kbd> `ScaleSC`
145144
ScaleSC integrated pipeline in a scanpy-like style.
146145

147-
It will automatcially load dataset in chunks, see `scalesc.util.AnnDataBatchReader` for details, and all methods in this class manipulate this chunked data.
146+
It will automatically load the dataset in chunks, see `scalesc.util.AnnDataBatchReader` for details, and all methods in this class manipulate this chunked data.
148147

149148

150149

@@ -156,7 +155,7 @@ It will automatcially load dataset in chunks, see `scalesc.util.AnnDataBatchRead
156155
- <b>`max_cell_batch`</b> (`int`): Maximum number of cells in a single batch.
157156
- <b>`Default`</b>: 100000.
158157
- <b>`preload_on_cpu`</b> (`bool`): If load the entire chunked data on CPU. Default: `True`
159-
- <b>`preload_on_gpu`</b> (`bool`): If load the entire chunked data on GPU, `preload_on_cpu` will be overwritten to `True` when this sets to `True`. Default is `True`.
158+
- <b>`preload_on_gpu`</b> (`bool`): If the entire chunked data is on GPU, `preload_on_cpu` will be overwritten to `True` when this is set to `True`. The default is `True`.
160159
- <b>`save_raw_counts`</b> (`bool`): If save `adata_X` to disk after QC filtering.
161160
- <b>`Default`</b>: False.
162161
- <b>`save_norm_counts`</b> (`bool`): If save `adata_X` data to disk after normalization.
@@ -193,15 +192,15 @@ __init__(
193192

194193
#### <kbd>property</kbd> adata
195194

196-
`AnnData`: An AnnData object that used to store all intermediate results without the count matrix.
195+
`AnnData`: An AnnData object that is used to store all intermediate results without the count matrix.
197196

198-
Note: This is always on CPU.
197+
Note: This is always on the CPU.
199198

200199
---
201200

202201
#### <kbd>property</kbd> adata_X
203202

204-
`AnnData`: An `AnnData` object that used to store all intermediate results including the count matrix. Internally, all chunks should be merged on CPU to avoid high GPU consumption, make sure to invoke `to_CPU()` before calling this object.
203+
`AnnData`: An `AnnData` object that is used to store all intermediate results, including the count matrix. Internally, all chunks should be merged on CPU to avoid high GPU consumption; make sure to invoke `to_CPU()` before calling this object.
205204

206205

207206

@@ -239,15 +238,15 @@ Clean the memory
239238
filter_cells(min_count=0, max_count=None, qc_var='n_genes_by_counts', qc=False)
240239
```
241240

242-
Filter genes based on number of a QC metric.
241+
Filter genes based on the number of a QC metric.
243242

244243

245244

246245
**Args:**
247246

248247
- <b>`min_count`</b> (`int`): Minimum number of counts required for a cell to pass filtering.
249248
- <b>`max_count`</b> (`int`): Maximum number of counts required for a cell to pass filtering.
250-
- <b>`qc_var`</b> (`str`='n_genes_by_counts'): Feature in QC metrics that used to filter cells.
249+
- <b>`qc_var`</b> (`str`='n_genes_by_counts'): Feature in QC metrics that is used to filter cells.
251250
- <b>`qc`</b> (`bool`=`False`): Call `calculate_qc_metrics` before filtering.
252251

253252
---
@@ -260,15 +259,15 @@ Filter genes based on number of a QC metric.
260259
filter_genes(min_count=0, max_count=None, qc_var='n_cells_by_counts', qc=False)
261260
```
262261

263-
Filter genes based on number of a QC metric.
262+
Filter genes based on the number of a QC metric.
264263

265264

266265

267266
**Args:**
268267

269268
- <b>`min_count`</b> (`int`): Minimum number of counts required for a gene to pass filtering.
270269
- <b>`max_count`</b> (`int`): Maximum number of counts required for a gene to pass filtering.
271-
- <b>`qc_var`</b> (`str`='n_cells_by_counts'): Feature in QC metrics that used to filter genes.
270+
- <b>`qc_var`</b> (`str`='n_cells_by_counts'): Feature in QC metrics that is used to filter genes.
272271
- <b>`qc`</b> (`bool`=`False`): Call `calculate_qc_metrics` before filtering.
273272

274273
---
@@ -289,23 +288,23 @@ filter_genes_and_cells(
289288
)
290289
```
291290

292-
Filter genes based on number of a QC metric.
291+
Filter genes based on the number of a QC metric.
293292

294293

295294

296295
**Note:**
297296

298-
> This is an efficient way to perform a regular filtering on genes and cells without repeatedly iterating over chunks.
297+
> This is an efficient way to perform regular filtering on genes and cells without repeatedly iterating over chunks.
299298
>
300299
301300
**Args:**
302301

303302
- <b>`min_counts_per_gene`</b> (`int`): Minimum number of counts required for a gene to pass filtering.
304303
- <b>`max_counts_per_gene`</b> (`int`): Maximum number of counts required for a gene to pass filtering.
305-
- <b>`qc_var_gene`</b> (`str`='n_cells_by_counts'): Feature in QC metrics that used to filter genes.
304+
- <b>`qc_var_gene`</b> (`str`='n_cells_by_counts'): Feature in QC metrics that is used to filter genes.
306305
- <b>`min_counts_per_cell`</b> (`int`): Minimum number of counts required for a cell to pass filtering.
307306
- <b>`max_counts_per_cell`</b> (`int`): Maximum number of counts required for a cell to pass filtering.
308-
- <b>`qc_var_cell`</b> (`str`='n_genes_by_counts'): Feature in QC metrics that used to filter cells.
307+
- <b>`qc_var_cell`</b> (`str`='n_genes_by_counts'): Feature in QC metrics that is used to filter cells.
309308
- <b>`qc`</b> (`bool`=`False`): Call `calculate_qc_metrics` before filtering.
310309

311310
---
@@ -349,7 +348,7 @@ Annotate highly variable genes.
349348

350349
**Note:**
351350

352-
> Only `seurat_v3` is implemented. Raw count matrix is expected as input for `seurat_v3`. HVGs are set to `True` in `adata.var['highly_variable']`.
351+
> Only `seurat_v3` is implemented. The raw count matrix is expected as input for `seurat_v3`. HVGs are set to `True` in `adata.var['highly_variable']`.
353352
>
354353
355354
**Args:**
@@ -407,7 +406,7 @@ Compute a neighborhood graph of observations using `rapids-singlecell`.
407406
normalize_log1p(target_sum=10000.0)
408407
```
409408

410-
Normalize counts per cell then log1p.
409+
Normalize counts per cell, then log1p.
411410

412411

413412

@@ -454,7 +453,7 @@ pca(n_components=50, hvg_var='highly_variable')
454453

455454
Principal component analysis.
456455

457-
Computes PCA coordinates, loadings and variance decomposition. Uses the implementation of scikit-learn.
456+
Computes PCA coordinates, loadings, and variance decomposition. Uses the implementation of scikit-learn.
458457

459458

460459

@@ -525,7 +524,7 @@ Save `adata` to disk in chunks.
525524
to_CPU()
526525
```
527526

528-
Move all chunks to CPU.
527+
Move all chunks to the CPU.
529528

530529
---
531530

@@ -537,7 +536,7 @@ Move all chunks to CPU.
537536
to_GPU()
538537
```
539538

540-
Move all chunks to GPU.
539+
Move all chunks to the GPU.
541540

542541
---
543542

@@ -703,7 +702,7 @@ read(fname)
703702
set_cells_filter(filter, update=True)
704703
```
705704

706-
Update cells filter and applied on data chunks if `update` set to `True`, otherwise, update filter only.
705+
Update the cells filter and apply it to data chunks if `update` is set to `True`, otherwise, update the filter only.
707706

708707
---
709708

@@ -715,13 +714,13 @@ Update cells filter and applied on data chunks if `update` set to `True`, otherw
715714
set_genes_filter(filter, update=True)
716715
```
717716

718-
Update genes filter and applied on data chunks if `update` set to True, otherwise, update filter only.
717+
Update genes filter and apply on data chunks if `update` set to True, otherwise, update filter only.
719718

720719

721720

722721
**Note:**
723722

724-
> Genes filter can be set sequentially, a new filter should be always compatible with the previous filtered data.
723+
> Genes filter can be set sequentially; a new filter should always be compatible with the previously filtered data.
725724
726725
---
727726

0 commit comments

Comments
 (0)