- "The data we are using here comes from SRA. In this example, we are using data from an experiment that compared RNA sequences in honeybees with and without viral infections. The BioProject ID is [PRJNA274674](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA274674). This experiment includes 6 RNA-seq samples and 2 methylation-seq samples. We are only considering the RNA-seq data here. Additionally, we have subsampled them to about 2 millions reads collectively accross all of the samples. In a real analysis this would not be a good idea, but to keep costs and runtimes low we will use the down-sampled files in this demonstration. If you want to explore the full dataset, we recommend pulling the fastq files using the [STRIDES tutorial on SRA downloads](https://github.com/STRIDES/NIHCloudLabGCP/blob/main/tutorials/notebooks/SRADownload/SRA-Download.ipynb). As with the original example in this module, we have concatenated all 6 files into one set of combined fastq files called joined_R{1,2}.fastq.gz We have stored the subsampled fastq files in this module's cloud storage bucket."
0 commit comments