|
22 | 22 | "id": "80044322-a021-4fdc-ad83-504961bd1919", |
23 | 23 | "metadata": {}, |
24 | 24 | "source": [ |
25 | | - "The data we are using here comes from SRA. In this example, we are using data from an experiment that compared RNA sequences in honeybees with and without viral infections. The BioProject ID is [PRJNA274674](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA274674). This experiment includes 6 RNA-seq samples and 2 methylation-seq samples. We are only considering the RNA-seq data here. Additionally, we have subsampled them to about 2 millions reads collectively accross all of the samples. In a real analysis this would not be a good idea, but to keep costs and runtimes low we will use the down-sampled files in this demonstration. If you want to explore the full dataset, we recommend pulling the fastq files using the [STRIDES tutorial on SRA downloads](https://github.com/STRIDES/NIHCloudLabGCP/blob/main/tutorials/notebooks/SRADownload/SRA-Download.ipynb). As with the original example in this module, we have concatenated all 6 files into one set of combined fastq files called apis_joined_R{1,2}.fastq.gz We have stored the subsampled fastq files in this module's cloud storage bucket." |
| 25 | + "The data we are using here comes from SRA. In this example, we are using data from an experiment that compared RNA sequences in honeybees with and without viral infections. The BioProject ID is [PRJNA274674](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA274674). This experiment includes 6 RNA-seq samples and 2 methylation-seq samples. We are only considering the RNA-seq data here. Additionally, we have subsampled them to about 2 millions reads collectively accross all of the samples. In a real analysis this would not be a good idea, but to keep costs and runtimes low we will use the down-sampled files in this demonstration. If you want to explore the full dataset, we recommend pulling the fastq files using the [STRIDES tutorial on SRA downloads](https://github.com/STRIDES/NIHCloudLabGCP/blob/main/tutorials/notebooks/SRADownload/SRA-Download.ipynb). As with the original example in this module, we have concatenated all 6 files into one set of combined fastq files called joined_R{1,2}.fastq.gz We have stored the subsampled fastq files in this module's cloud storage bucket." |
26 | 26 | ] |
27 | 27 | }, |
28 | 28 | { |
|
219 | 219 | "id": "08ae572a-fe6d-4852-a11c-5a2449c1d6b2", |
220 | 220 | "metadata": {}, |
221 | 221 | "source": [ |
222 | | - "You should see the apis_joined fastq files alongside the others that we use in the previous submodules. Now let's adjust the workflow to run on them." |
| 222 | + "You should see the joined fastq files alongside the others that we use in the previous submodules. Now let's adjust the workflow to run on them." |
223 | 223 | ] |
224 | 224 | }, |
225 | 225 | { |
226 | 226 | "cell_type": "markdown", |
227 | 227 | "id": "331a3857-7734-41e4-819a-de3603b9c95b", |
228 | 228 | "metadata": {}, |
229 | 229 | "source": [ |
230 | | - "One of the great benefits of using a workflow manager like Nextflow is that it allows easy swapping of input samples without drastic changes to the code. In the true spirit of reproducible workflows, the only change necessary in order to run the apis samples is to adjust the `reads` line in the `nextflow.config` file `params` section to point to the new reads location. In the line below, write the updated reads path that you would add to the config file. " |
| 230 | + "One of the great benefits of using a workflow manager like Nextflow is that it allows easy swapping of input samples without drastic changes to the code. In the true spirit of reproducible workflows, the only change necessary in order to run the joined samples is to adjust the `reads` line in the `nextflow.config` file `params` section to point to the new reads location. In the line below, write the updated reads path that you would add to the config file. " |
231 | 231 | ] |
232 | 232 | }, |
233 | 233 | { |
|
254 | 254 | "\n", |
255 | 255 | "```\n", |
256 | 256 | "// Directory for reads\n", |
257 | | - "reads=\"/home/jupyter/resources/seq2/apis_joined*R[1,2].fastq.gz\"\n", |
| 257 | + "reads=\"/home/jupyter/resources/seq2/joined*R[1,2].fastq.gz\"\n", |
258 | 258 | "```\n", |
259 | 259 | " \n", |
260 | 260 | " \n", |
|
296 | 296 | "metadata": { |
297 | 297 | "environment": { |
298 | 298 | "kernel": "python3", |
299 | | - "name": "common-cpu.m113", |
| 299 | + "name": "r-cpu.4-2.m100", |
300 | 300 | "type": "gcloud", |
301 | | - "uri": "gcr.io/deeplearning-platform-release/base-cpu:m113" |
| 301 | + "uri": "gcr.io/deeplearning-platform-release/r-cpu.4-2:m100" |
302 | 302 | }, |
303 | 303 | "kernelspec": { |
304 | | - "display_name": "PySpark (Local)", |
| 304 | + "display_name": "Python 3", |
305 | 305 | "language": "python", |
306 | | - "name": "local-pyspark" |
| 306 | + "name": "python3" |
307 | 307 | }, |
308 | 308 | "language_info": { |
309 | 309 | "codemirror_mode": { |
|
0 commit comments