Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 1 addition & 3 deletions submodule01_genome_sequencing_and_assembly.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,7 @@
"## Overview\n",
"In this submodule you will learn how DNA and RNA sequencing data is produced and how to reconstruct the genome of an organism from short DNA sequences generated by sequencing technologies. The process will be completed through an interactive markdown document and will use command-line bioinformatic tools. We will explore the key concepts and methodologies in genomics, including how to assemble a genome, how to assess its quality, how to annotate the genome, and how to perform comparative analyses.\n",
"\n",
"<p align=\"center\">\n",
" <img src=\"images/diagram-WGS.png\" width=\"50%\"/>\n",
"</p>\n",
"![Overview of the bioinformatic workflow](images/diagram-WGS.png)\n",
"\n",
"Figure 1. Overview of the bioinformatic workflow for the four submodules in this tutorial.\n",
"### Learning Objectives:\n",
Expand Down
4 changes: 2 additions & 2 deletions submodule02_assembly_assessment.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -698,7 +698,7 @@
"\n",
"# taxonomy database setup\n",
"# copy nodesDB from AWS S3 bucket (file is too large to hold in github)\n",
"aws s3 cp s3://nigms-sandbox/comparative-microbial-genomics/databases/nodesDB.txt wgs-nf/databases/blast_db/\n",
"aws s3 cp s3://nigms-sandbox/comparative-microbial-genomics/databases/nodesDB.txt wgs-nf/databases/blast_db/ --no-sign-request\n",
"taxdb=wgs-nf/databases/blast_db/nodesDB.txt\n",
"\n",
"# Create lookup table\n",
Expand Down Expand Up @@ -855,7 +855,7 @@
"outdir=wgs-nf/databases/bakta_db/\n",
"\n",
"# copy from aws S3 bucket\n",
"aws s3 cp s3://nigms-sandbox/comparative-microbial-genomics/databases/bakta-light/ $outdir --recursive --quiet\n",
"aws s3 cp s3://nigms-sandbox/comparative-microbial-genomics/databases/bakta-light/ $outdir --recursive --quiet --no-sign-request\n",
"echo \"Bakta database copied from S3\"\n",
"\n",
"# update amrfinder database\n",
Expand Down
27 changes: 0 additions & 27 deletions submodule03_automate_workflow.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -344,19 +344,6 @@
"<div class=\"alert alert-block alert-warning\"> <b>Attention:</b> Before You Proceed: Pause here and wait for the nextflow code to finish running. The brackets around the block of code will switch from '*' to a number when it is completed and you will see the duration displayed at the bottom of the output. </div>"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "64917be4-8cb5-49af-954f-f2119d0183bd",
"metadata": {},
"outputs": [],
"source": [
"%%bash\n",
"\n",
"# view contents of output directory\n",
"ls wgs-nf/output-dir/*"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -400,20 +387,6 @@
"grep 'Missing' wgs-nf/output-dir/output-busco/*"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f0a2e9ef-cf11-447d-84cc-51994451ead6",
"metadata": {},
"outputs": [],
"source": [
"%%bash\n",
"\n",
"# final output directory for the next steps\n",
"\n",
"ls wgs-nf/output-dir/proteomes"
]
},
{
"cell_type": "markdown",
"id": "c74ba0b8-ab2e-4c70-be8e-1caa7524cc94",
Expand Down
66 changes: 30 additions & 36 deletions submodule05_Nextflow_AWSBatch.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@
"+ Please ensure you have a VPC, subnets, and security group set up before running this tutorial.\n",
"+ Role with AdministratorAccess, AmazonSageMakerFullAccess, S3 access and AWSBatchServiceRole.\n",
"+ Instance Role with AmazonECS_FullAccess, AmazonEC2ContainerRegistryFullAccess, and S3 access.\n",
"+ If you do not have the required set-up for AWS Batch please follow this tutorial [here](https://github.com/STRIDES/NIHCloudLabAWS/blob/zbyosufzai-awsbatch-1/notebooks/AWSBatch/Intro_AWS_Batch.ipynb#install_nextflow). ***When making the instance role, make another for SageMaker notebooks with the following permissions: AdminstratorAccess, AmazonEC2ContainerRegistryFullAccess, AmazonECS_FullAccess, AmazonS3FullAccess, AmazonSageMakerFullAccess, and AWSBatchServiceRole.***\n",
"+ If you do not have the required set-up for AWS Batch please follow this tutorial [here](https://github.com/STRIDES/NIHCloudLabAWS/blob/main/notebooks/AWSBatch/Intro_AWS_Batch.ipynb). ***When making the instance role, make another for SageMaker notebooks with the following permissions: AdminstratorAccess, AmazonEC2ContainerRegistryFullAccess, AmazonECS_FullAccess, AmazonS3FullAccess, AmazonSageMakerFullAccess, and AWSBatchServiceRole.***\n",
"It is recommended that specific permission to folders are added through inline policy. An example of the JSON is below:\n",
"\n",
"<pre>\n",
Expand Down Expand Up @@ -130,37 +130,8 @@
"metadata": {},
"outputs": [],
"source": [
"# Install Nextflow\n",
"! mamba install -y -c conda-forge -c bioconda nextflow --quiet"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fdc0c5dc",
"metadata": {},
"outputs": [],
"source": [
"##### Import relevant libraries\n",
"# Created using this https://github.com/STRIDES/NIHCloudLabAWS/blob/zbyosufzai-awsbatch-1/notebooks/AWSBatch/Intro_AWS_Batch.ipynb#install_nextflow\n",
"#Run if you don't have Java installed\n",
"! sudo apt update\n",
"! sudo apt-get install default-jdk -y\n",
"! java -version"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3124016c",
"metadata": {},
"outputs": [],
"source": [
"#Install nexflow, make it exceutable, and update it\n",
"! curl https://get.nextflow.io | bash\n",
"! chmod +x nextflow\n",
"! ./nextflow self-update\n",
"! ./nextflow plugin update"
"#Make an s3 bucket to store input and output files\n",
"!aws s3 mb s3://$BUCKET_NAME"
]
},
{
Expand All @@ -170,8 +141,8 @@
"metadata": {},
"outputs": [],
"source": [
"# replace batch bucket name in nextflow configuration file\n",
"! sed -i \"s/aws-batch-nigms-batch-bucket-/$BUCKET_NAME/g\" wgsbac/nextflow.config"
"# replace batch bucket name in nextflow configuration file \n",
"! sed -i \"s/aws-batch-nigms-batch-bucket/$BUCKET_NAME/g\" wgsbac/nextflow.config"
]
},
{
Expand Down Expand Up @@ -212,7 +183,12 @@
"outputs": [],
"source": [
"# Run nextflow script with parameters \n",
"! ./nextflow run wgsbac/main.nf --input s3://$INPUT_FOLDER/samplesheet_test.csv -profile docker,awsbatch -c wgsbac/nextflow.config --awsqueue $AWS_QUEUE --awsregion $AWS_REGION"
"! ./nextflow run wgsbac/main.nf \\\n",
" --input s3://$INPUT_FOLDER/samplesheet_test.csv \\\n",
" -profile docker,awsbatch \\\n",
" -c wgsbac/nextflow.config \\\n",
" --awsqueue $AWS_QUEUE \\\n",
" --awsregion $AWS_REGION "
]
},
{
Expand Down Expand Up @@ -295,7 +271,25 @@
]
}
],
"metadata": {},
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading