Skip to content

Commit 240fbe7

Browse files
committed
Updated submodule 2
1 parent 0bd0e4a commit 240fbe7

2 files changed

Lines changed: 126 additions & 51 deletions

File tree

AWS/Submodule_2_annotation_only.ipynb

Lines changed: 108 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -174,36 +174,82 @@
174174
"id": "64228197",
175175
"metadata": {},
176176
"source": [
177-
"### **Step 2:** AWS Batch Setup\n",
177+
"## Get Started\n",
178+
"### **Step 2:** Setting up AWS Batch\n",
178179
"\n",
179-
"AWS Batch will create the needed permissions, roles and resources to run Nextflow in a serverless manner. You can set up AWS Batch manually or deploy it **automatically** with a stack template. The Launch Stack button below will take you to the cloud formation create stack webpage with the template with required resources already linked. \n",
180+
"AWS Batch manages the provisioning of compute environments (EC2, Fargate), container orchestration, job queues, IAM roles, and permissions. We can deploy a full environment either:\n",
181+
"- Automatically using a preconfigured AWS CloudFormation stack (**recommended**)\n",
182+
"- Manually by setting up roles, queues, and buckets\n",
183+
"The Launch Stack button below will take you to the cloud formation create stack webpage with the template with required resources already linked. \n",
180184
"\n",
181-
"If you prefer to skip manual deployment and deploy automatically in the cloud, click the Launch Stack button below. For a walkthrough of the screens during automatic deployment please click [here](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/HowToLaunchAWSBatch.md). The deployment should take ~5 min and then the resources will be ready for use. \n",
185+
"If you prefer to skip manual deployment and deploy automatically in the cloud, click the **Launch Stack** button below. For a walkthrough of the screens during automatic deployment please click [here](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/HowToLaunchAWSBatch.md). The deployment should take ~5 min and then the resources will be ready for use. \n",
182186
"\n",
183-
"[![Launch Stack](../images/LaunchStack.jpg)](https://console.aws.amazon.com/cloudformation/home?#/stacks/new?stackName=aws-batch-nigms&templateURL=https://nigms-sandbox.s3.us-east-1.amazonaws.com/cf-templates/AWSBatch_template.yaml)\n",
187+
"[![Launch Stack](../images/LaunchStack.jpg)](https://console.aws.amazon.com/cloudformation/home?#/stacks/new?stackName=aws-batch-nigms&templateURL=https://nigms-sandbox.s3.us-east-1.amazonaws.com/cf-templates/AWSBatch_template.yaml )\n",
184188
"\n",
189+
"### **Step 3:** Install dependencies, update paths and create a new S3 Bucket to store input and output files\n",
185190
"\n",
186-
"Before beginning this tutorial, if you do not have required roles, policies, permissions or compute environment and would like to **manually** set those up please click [here](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/AWS-Batch-Setup.md) to set that up."
191+
"After setting up an AWS CloudFormation stack, we need to let the nextflow workflow to know where are those resrouces by providing the configuration:\n",
192+
"<div style=\"border: 1px solid #e57373; padding: 0px; border-radius: 4px;\">\n",
193+
" <div style=\"background-color: #ffcdd2; padding: 5px; \">\n",
194+
" <i class=\"fas fa-exclamation-triangle\" style=\"color: #b71c1c;margin-right: 5px;\"></i><a style=\"color: #b71c1c\"><b>Important</b> - Customize Required</a>\n",
195+
" </div>\n",
196+
" <p style=\"margin-left: 5px;\">\n",
197+
"After successfull creation of your stack you must attatch a new role to SageMaker to be able to submit batch jobs. Please following the the following steps to change your SageMaker role:<br>\n",
198+
"<ol> <li>Navigate to your SageMaker AI notebook dashboard (where you initially created and launched your VM)</li> <li>Locate your instance and click the <b>Stop</b> button</li> <li>Once the instance is stopped: <ul> <li>Click <b>Edit</b></li> <li>Scroll to the \"Permissions and encryption\" section</li> <li>Click the IAM role dropdown</li> <li>Select the new role created during stack formation (named something like <b>aws-batch-nigms-SageMakerExecutionRole</b>)</li> </ul> </li> \n",
199+
"<li>Click <b>Update notebook instance</b> to save your changes</li> \n",
200+
"<li>After the update completes: <ul> <li>Click <b>Start</b> to relaunch your instance</li> <li>Reconnect to your instance</li> <li>Resume your work from this point</li> </ul> </li> </ol>\n",
201+
"\n",
202+
"<b>Warning:</b> Make sure to replace the <b>stack name</b> to the stack that you just created. <code>STACK_NAME = \"your-stack-name-here\"</code>\n",
203+
" </p>\n",
204+
"</div>"
187205
]
188206
},
189207
{
190-
"cell_type": "markdown",
191-
"id": "4506a617",
208+
"cell_type": "code",
209+
"execution_count": null,
210+
"id": "e6d78aa5",
211+
"metadata": {},
212+
"outputs": [],
213+
"source": [
214+
"# define a stack name variable\n",
215+
"STACK_NAME = \"aws-batch-nigms-test1\""
216+
]
217+
},
218+
{
219+
"cell_type": "code",
220+
"execution_count": null,
221+
"id": "fc344828",
192222
"metadata": {},
223+
"outputs": [],
224+
"source": [
225+
"import boto3\n",
226+
"# Get account ID and region \n",
227+
"account_id = boto3.client('sts').get_caller_identity().get('Account')\n",
228+
"region = boto3.session.Session().region_name"
229+
]
230+
},
231+
{
232+
"cell_type": "code",
233+
"execution_count": null,
234+
"id": "6c908d53",
235+
"metadata": {},
236+
"outputs": [],
193237
"source": [
194-
"#### Change the parameters as desired in `aws` profile inside `../denovotrascript/nextflow.config` file:\n",
195-
" - Name of your **AWS Batch Job Queue**\n",
196-
" - AWS region \n",
197-
" - Nextflow work directory\n",
198-
" - Nextflow output directory"
238+
"# Set variable names \n",
239+
"# These variables should come from the Intro AWS Batch tutorial (or leave as-is if using the launch stack button)\n",
240+
"BUCKET_NAME = f\"{STACK_NAME}-batch-bucket-{account_id}\"\n",
241+
"AWS_QUEUE = f\"{STACK_NAME}-JobQueue\"\n",
242+
"INPUT_FOLDER = 'nigms-sandbox/nosi-inbremaine-storage/'\n",
243+
"AWS_REGION = region"
199244
]
200245
},
201246
{
202247
"cell_type": "markdown",
203-
"id": "abdb13bb",
248+
"id": "596667bd",
204249
"metadata": {},
205250
"source": [
206-
"### **Step 3:** Install Nextflow"
251+
"#### Install dependencies\n",
252+
"Installs Nextflow and Java, which are required to execute the pipeline. In environments like SageMaker, Java is usually pre-installed. But if you're running outside SageMaker (e.g., EC2 or local), you’ll need to manually install it."
207253
]
208254
},
209255
{
@@ -213,35 +259,63 @@
213259
"metadata": {},
214260
"outputs": [],
215261
"source": [
216-
"%%capture\n",
217-
"! mamba create -n nextflow -c bioconda nextflow -y\n",
218-
"! mamba install -n nextflow ipykernel -y"
262+
"# Install Nextflow\n",
263+
"! mamba install -y -c conda-forge -c bioconda nextflow --quiet"
219264
]
220265
},
221266
{
222267
"cell_type": "markdown",
223-
"id": "096b76d5",
268+
"id": "9e08a0d5",
224269
"metadata": {},
225270
"source": [
226-
"<div class=\"alert alert-block alert-danger\">\n",
227-
" <i class=\"fa fa-exclamation-circle\" aria-hidden=\"true\"></i>\n",
228-
" <b>Alert: </b> Remember to change your kernel to <b>conda_nextflow</b> to run nextflow.\n",
229-
"</div>"
271+
"<details>\n",
272+
"<summary>Install Java and Nextflow if needed in other systems</summary>\n",
273+
"If using other system other than AWS SageMaker Notebook, you might need to install java and nextflow using the code below:\n",
274+
"<br> <i># Install java</i><pre>\n",
275+
" sudo apt update\n",
276+
" sudo apt-get install default-jdk -y\n",
277+
" java -version\n",
278+
" </pre>\n",
279+
" <i># Install Nextflow</i><pre>\n",
280+
" curl https://get.nextflow.io | bash\n",
281+
" chmod +x nextflow\n",
282+
" ./nextflow self-update\n",
283+
" ./nextflow plugin update\n",
284+
" </pre>\n",
285+
"</details>"
286+
]
287+
},
288+
{
289+
"cell_type": "markdown",
290+
"id": "c46757a3",
291+
"metadata": {},
292+
"source": [
293+
"# replace batch bucket name in nextflow configuration file\n",
294+
"! sed -i \"s/aws-batch-nigms-batch-bucket-/$BUCKET_NAME/g\" nextflow.config\n",
295+
"# replace job queue name in configuration file \n",
296+
"! sed -i \"s/aws-batch-nigms-JobQueue/$AWS_QUEUE/g\" nextflow.config\n",
297+
"# replace the region placeholder with the region you are in \n",
298+
"! sed -i \"s/aws-region/$AWS_REGION/g\" nextflow.config"
230299
]
231300
},
232301
{
233302
"cell_type": "markdown",
234303
"id": "de3d1b9b",
235304
"metadata": {},
236305
"source": [
237-
"### **Step 4:** Run `denovotranscript`"
306+
"### **Step 4:** Enable AWS Batch for the nextflow script `denovotranscript`"
238307
]
239308
},
240309
{
241310
"cell_type": "markdown",
242311
"id": "8e1541b9-abb6-47c0-aa49-5c1720680376",
243312
"metadata": {},
244313
"source": [
314+
"Run the pipeline in a cloud-native, serverless manner using AWS Batch. AWS Batch offloads the burden of provisioning and managing compute resources. When you execute this command:\n",
315+
"- Nextflow uploads tasks to AWS Batch. \n",
316+
"- AWS Batch pulls the necessary containers.\n",
317+
"- Each process/task in the pipeline runs as an isolated job in the cloud.\n",
318+
"\n",
245319
"Now we can run `denovotranscript` using the option `annotation_only` run-mode which assumes that the transcriptome has been generated, and will only run the various steps for annotation of the transcripts.\n",
246320
"\n",
247321
">This run should take about **5 minutes**"
@@ -263,7 +337,16 @@
263337
"id": "8a0f8dfb-366d-4e0f-af4e-d96f6ee97d34",
264338
"metadata": {},
265339
"source": [
266-
"The output will be arranged in a directory structure in your Amazon S3 bucket. We will download it into our local directory:"
340+
"The output will be arranged in a directory structure in your Amazon S3 bucket. We will download it into our local directory:\n",
341+
"<div style=\"border: 1px solid #e57373; padding: 0px; border-radius: 4px;\">\n",
342+
" <div style=\"background-color: #ffcdd2; padding: 5px; \">\n",
343+
" <i class=\"fas fa-exclamation-triangle\" style=\"color: #b71c1c;margin-right: 5px;\"></i><a style=\"color: #b71c1c\"><b>Important</b> </a>\n",
344+
" </div>\n",
345+
" <p style=\"margin-left: 5px;\">\n",
346+
"\n",
347+
" Update \\<Your-Output-Directory-annotation-only> to your local annotation only folder. <br>\n",
348+
" </p>\n",
349+
"</div>\n"
267350
]
268351
},
269352
{
@@ -274,7 +357,7 @@
274357
"outputs": [],
275358
"source": [
276359
"! mkdir -p <Your-Output-Directory-annotation-only>\n",
277-
"! aws s3 cp --recursive s3://<YOUR-BUCKET-NAME>/<Your-Output-Directory-annotation-only>/ ./<Your-Output-Directory-annotation-only>"
360+
"! aws s3 cp --recursive s3://$BUCKET_NAME/nextflow_output/ ./<Your-Output-Directory-annotation-only>"
278361
]
279362
},
280363
{
@@ -287,15 +370,6 @@
287370
"! ls -l ./<Your-Output-Directory-annotation-only>"
288371
]
289372
},
290-
{
291-
"cell_type": "markdown",
292-
"id": "1b3ac17d",
293-
"metadata": {},
294-
"source": [
295-
"----\n",
296-
"# Andrea, please update this part"
297-
]
298-
},
299373
{
300374
"cell_type": "markdown",
301375
"id": "337b1049",
@@ -314,14 +388,6 @@
314388
"! cat ./onlyAnnRun/output/RUN_INFO.txt"
315389
]
316390
},
317-
{
318-
"cell_type": "markdown",
319-
"id": "df312985",
320-
"metadata": {},
321-
"source": [
322-
"---"
323-
]
324-
},
325391
{
326392
"cell_type": "markdown",
327393
"id": "4187a790-276c-4bf2-8ce8-2f7985e8c662",

denovotranscript/nextflow.config

Lines changed: 18 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,14 @@ params {
2424
qc_only = false
2525
skip_assembly = false
2626

27+
// AWS parameters
28+
awsqueue = 'aws-batch-nigms-JobQueue'
29+
awsregion = 'aws-region'
30+
awsworkdir = 's3://aws-batch-nigms-batch-bucket-/nextflow_env/'
31+
outdir = 's3://aws-batch-nigms-batch-bucket-/nextflow_output/'
32+
awscli_path = '/home/ec2-user/miniconda/bin/aws'
33+
aws_execrole = 'ExecutionRole'
34+
2735
// Trimming options
2836
adapter_fasta = null
2937
save_trimmed_fail = false
@@ -107,15 +115,16 @@ profiles {
107115
aws {
108116
process {
109117
executor = 'awsbatch'
110-
queue = 'aws-batch-nigms-JobQueue' // Name of your Job queue
118+
queue = params.awsqueue // Name of your Job queue
119+
container = 'quay.io/nf-core/ubuntu:22.04'
111120
}
112-
fusion.enabled = true
113-
wave.enabled = true
114-
aws.region = 'us-east-1' // YOUR AWS REGION
115-
116-
workDir = 's3://<YOUR-BUCKET-NAME>/<Your-Work-Directory>/' // Path of your working directory
117-
params.outdir = 's3://<YOUR-BUCKET-NAME>/<Your-Output-Directory>/' // Path of your output directory
118-
121+
workDir = params.awsworkdir // Path of your working directory
122+
outdir = params.outdir // Path of your output directory
123+
fusion.enabled = false
124+
wave.enabled = false
125+
// Give path to where aws is installed
126+
aws.batch.cliPath = params.awscli_path
127+
aws.region = params.awsregion // YOUR AWS REGION
119128

120129
}
121130
gbatch {
@@ -126,7 +135,7 @@ profiles {
126135
process.machineType = 'n2-highmem-48'
127136

128137
workDir = 'gs://<YOUR-BUCKET-NAME>/<Your-Work-Directory>/' // Path of your working directory
129-
params.outdir = 'gs://<YOUR-BUCKET-NAME>/<Your-Output-Directory>/' // Path of your output directory
138+
outdir = 'gs://<YOUR-BUCKET-NAME>/<Your-Output-Directory>/' // Path of your output directory
130139
}
131140

132141
debug {

0 commit comments

Comments
 (0)