Skip to content

Commit b68d3d6

Browse files
updated submodule 2
1 parent a61ef58 commit b68d3d6

7 files changed

Lines changed: 189 additions & 1878 deletions

AWS/Submodule_2_annotation_only.ipynb

Lines changed: 81 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,56 @@
99
"# Notebook 2: Using \"denovoscript\" to Performing an \"Annotation Only\" Run"
1010
]
1111
},
12+
{
13+
"cell_type": "markdown",
14+
"id": "5b36451c",
15+
"metadata": {},
16+
"source": [
17+
"## Overview\n",
18+
"\n",
19+
"This Jupyter Notebook provides a learning module on transcriptome assembly, specifically focusing on annotation using the `denovoscript` pipeline. It guides users through an \"annotation only\" run, assuming a pre-assembled transcriptome. The notebook begins with an introductory video and illustration of the annotation workflow. It then demonstrates downloading a rainbow trout transcriptome from an Amazon S3 bucket and counting its sequences. Users are instructed to set up AWS Batch for serverless Nextflow execution, either automatically via a CloudFormation template or manually. After installing Nextflow and switching the kernel, `denovoscript` is executed in annotation-only mode using the downloaded transcriptome. Results are then downloaded from S3 to the local directory for inspection. The notebook then introduces the concept of Docker containers and guides users through running BUSCO within a container to assess transcriptome completeness using the vertebrata gene set. Finally, interactive quizzes prompt users to interpret BUSCO, GO, and TransDecoder results, emphasizing the importance of understanding data provenance. A second, user-driven BUSCO analysis on a different transcriptome is assigned as a final exercise, encouraging exploration of different lineages and critical evaluation of results."
20+
]
21+
},
22+
{
23+
"cell_type": "markdown",
24+
"id": "d8fbab1d",
25+
"metadata": {},
26+
"source": [
27+
"## Learning Objectives\n",
28+
"\n",
29+
"* **Understanding transcriptome annotation:** Learn the process of annotating a pre-assembled transcriptome.\n",
30+
"* **Using `denovoscript` for annotation:** Gain practical experience using the `denovoscript` pipeline with the `annotation_only` run mode.\n",
31+
"* **Working with AWS Batch:** Learn how to set up and utilize AWS Batch for running Nextflow pipelines in a serverless environment.\n",
32+
"* **Understanding and using Docker containers:** Become familiar with Docker containers and how to execute bioinformatics tools like BUSCO within them.\n",
33+
"* **Assessing transcriptome completeness with BUSCO:** Learn how to use BUSCO to evaluate the completeness of a transcriptome assembly using different lineage datasets.\n",
34+
"* **Interpreting BUSCO, GO, and TransDecoder results:** Develop skills in interpreting the output files generated by these tools and understanding their implications.\n",
35+
"* **Understanding data provenance:** Appreciate the importance of considering the origin and processing of transcriptomic data before analysis.\n",
36+
"* **Running BUSCO analysis independently:** Apply learned concepts by independently constructing and executing BUSCO commands for different transcriptomes and lineages.\n",
37+
"* **Critical evaluation of BUSCO results:** Learn to analyze BUSCO results critically, considering factors such as transcriptome quality, lineage selection, and biological explanations for observed patterns (e.g., duplicated or fragmented genes)."
38+
]
39+
},
40+
{
41+
"cell_type": "markdown",
42+
"id": "0cf3bb78",
43+
"metadata": {},
44+
"source": [
45+
"## Prerequisites\n",
46+
"\n",
47+
"**1. Software/Environment:**\n",
48+
"\n",
49+
"* Jupyter Notebook (Python kernel)\n",
50+
"* AWS CLI (configured)\n",
51+
"* Nextflow (installed via `mamba`, switch kernel to `conda_nextflow`)\n",
52+
"* Docker (running, user permissions correct)\n",
53+
"* `jupytercards` (install via `pip`)\n",
54+
"\n",
55+
"**2. Enabled APIs:**\n",
56+
"\n",
57+
"* AWS Batch\n",
58+
"* Amazon S3\n",
59+
"\n"
60+
]
61+
},
1262
{
1363
"cell_type": "markdown",
1464
"id": "16adea33",
@@ -141,7 +191,7 @@
141191
"id": "4506a617",
142192
"metadata": {},
143193
"source": [
144-
"#### Change the parameters as desired in `aws` profile inside `../denovotrascript/nextflow.config` file\n",
194+
"#### Change the parameters as desired in `aws` profile inside `../denovotrascript/nextflow.config` file:\n",
145195
" - Name of your **AWS Batch Job Queue**\n",
146196
" - AWS region \n",
147197
" - Nextflow work directory\n",
@@ -239,22 +289,39 @@
239289
},
240290
{
241291
"cell_type": "markdown",
242-
"id": "f3255502-270c-4ebb-9f72-141c4fab5c0f",
292+
"id": "1b3ac17d",
243293
"metadata": {},
244294
"source": [
245-
"**Step 6:** Let's take a look at the `RUN_INFO.txt` file to see what the parameters and programs associated with our analysis were."
295+
"----\n",
296+
"# Andrea, please update this part"
297+
]
298+
},
299+
{
300+
"cell_type": "markdown",
301+
"id": "337b1049",
302+
"metadata": {},
303+
"source": [
304+
"Let's take a look at the `RUN_INFO.txt` file to see what the parameters and programs associated with our analysis were."
246305
]
247306
},
248307
{
249308
"cell_type": "code",
250309
"execution_count": null,
251-
"id": "a14265d6-bd68-4f08-a55e-c472c3f23faa",
310+
"id": "e69ee1bb",
252311
"metadata": {},
253312
"outputs": [],
254313
"source": [
255314
"! cat ./onlyAnnRun/output/RUN_INFO.txt"
256315
]
257316
},
317+
{
318+
"cell_type": "markdown",
319+
"id": "df312985",
320+
"metadata": {},
321+
"source": [
322+
"---"
323+
]
324+
},
258325
{
259326
"cell_type": "markdown",
260327
"id": "4187a790-276c-4bf2-8ce8-2f7985e8c662",
@@ -523,6 +590,16 @@
523590
"# Put your BUSCO command here"
524591
]
525592
},
593+
{
594+
"cell_type": "markdown",
595+
"id": "8e8b88c4",
596+
"metadata": {},
597+
"source": [
598+
"## Conclusion\n",
599+
"\n",
600+
"This notebook provided a comprehensive hands-on experience in transcriptome annotation using the `denovoscript` pipeline in annotation-only mode, leveraging AWS Batch for serverless execution and Docker containers for BUSCO analysis. Through a guided workflow, users learned to set up AWS Batch, execute `denovoscript` to annotate a rainbow trout transcriptome, assess transcriptome completeness with BUSCO, and critically interpret the results from BUSCO, GO, and TransDecoder analyses. Furthermore, the notebook emphasized the importance of understanding data provenance and culminated in an independent BUSCO analysis exercise, challenging users to apply their newfound skills to different transcriptomes and critically evaluate the outcomes, thus solidifying their understanding of transcriptome assembly and annotation principles."
601+
]
602+
},
526603
{
527604
"cell_type": "markdown",
528605
"id": "5bc80021",

0 commit comments

Comments
 (0)