|
174 | 174 | "id": "64228197", |
175 | 175 | "metadata": {}, |
176 | 176 | "source": [ |
177 | | - "### **Step 2:** AWS Batch Setup\n", |
| 177 | + "## Get Started\n", |
| 178 | + "### **Step 2:** Setting up AWS Batch\n", |
178 | 179 | "\n", |
179 | | - "AWS Batch will create the needed permissions, roles and resources to run Nextflow in a serverless manner. You can set up AWS Batch manually or deploy it **automatically** with a stack template. The Launch Stack button below will take you to the cloud formation create stack webpage with the template with required resources already linked. \n", |
| 180 | + "AWS Batch manages the provisioning of compute environments (EC2, Fargate), container orchestration, job queues, IAM roles, and permissions. We can deploy a full environment either:\n", |
| 181 | + "- Automatically using a preconfigured AWS CloudFormation stack (**recommended**)\n", |
| 182 | + "- Manually by setting up roles, queues, and buckets\n", |
| 183 | + "The Launch Stack button below will take you to the cloud formation create stack webpage with the template with required resources already linked. \n", |
180 | 184 | "\n", |
181 | | - "If you prefer to skip manual deployment and deploy automatically in the cloud, click the Launch Stack button below. For a walkthrough of the screens during automatic deployment please click [here](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/HowToLaunchAWSBatch.md). The deployment should take ~5 min and then the resources will be ready for use. \n", |
| 185 | + "If you prefer to skip manual deployment and deploy automatically in the cloud, click the **Launch Stack** button below. For a walkthrough of the screens during automatic deployment please click [here](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/HowToLaunchAWSBatch.md). The deployment should take ~5 min and then the resources will be ready for use. \n", |
182 | 186 | "\n", |
183 | | - "[](https://console.aws.amazon.com/cloudformation/home?#/stacks/new?stackName=aws-batch-nigms&templateURL=https://nigms-sandbox.s3.us-east-1.amazonaws.com/cf-templates/AWSBatch_template.yaml)\n", |
| 187 | + "[](https://console.aws.amazon.com/cloudformation/home?#/stacks/new?stackName=aws-batch-nigms&templateURL=https://nigms-sandbox.s3.us-east-1.amazonaws.com/cf-templates/AWSBatch_template.yaml )\n", |
184 | 188 | "\n", |
| 189 | + "### **Step 3:** Install dependencies, update paths and create a new S3 Bucket to store input and output files\n", |
185 | 190 | "\n", |
186 | | - "Before beginning this tutorial, if you do not have required roles, policies, permissions or compute environment and would like to **manually** set those up please click [here](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/AWS-Batch-Setup.md) to set that up." |
| 191 | + "After setting up an AWS CloudFormation stack, we need to let the nextflow workflow to know where are those resrouces by providing the configuration:\n", |
| 192 | + "<div style=\"border: 1px solid #e57373; padding: 0px; border-radius: 4px;\">\n", |
| 193 | + " <div style=\"background-color: #ffcdd2; padding: 5px; \">\n", |
| 194 | + " <i class=\"fas fa-exclamation-triangle\" style=\"color: #b71c1c;margin-right: 5px;\"></i><a style=\"color: #b71c1c\"><b>Important</b> - Customize Required</a>\n", |
| 195 | + " </div>\n", |
| 196 | + " <p style=\"margin-left: 5px;\">\n", |
| 197 | + "After successfull creation of your stack you must attatch a new role to SageMaker to be able to submit batch jobs. Please following the the following steps to change your SageMaker role:<br>\n", |
| 198 | + "<ol> <li>Navigate to your SageMaker AI notebook dashboard (where you initially created and launched your VM)</li> <li>Locate your instance and click the <b>Stop</b> button</li> <li>Once the instance is stopped: <ul> <li>Click <b>Edit</b></li> <li>Scroll to the \"Permissions and encryption\" section</li> <li>Click the IAM role dropdown</li> <li>Select the new role created during stack formation (named something like <b>aws-batch-nigms-SageMakerExecutionRole</b>)</li> </ul> </li> \n", |
| 199 | + "<li>Click <b>Update notebook instance</b> to save your changes</li> \n", |
| 200 | + "<li>After the update completes: <ul> <li>Click <b>Start</b> to relaunch your instance</li> <li>Reconnect to your instance</li> <li>Resume your work from this point</li> </ul> </li> </ol>\n", |
| 201 | + "\n", |
| 202 | + "<b>Warning:</b> Make sure to replace the <b>stack name</b> to the stack that you just created. <code>STACK_NAME = \"your-stack-name-here\"</code>\n", |
| 203 | + " </p>\n", |
| 204 | + "</div>" |
187 | 205 | ] |
188 | 206 | }, |
189 | 207 | { |
190 | | - "cell_type": "markdown", |
191 | | - "id": "4506a617", |
| 208 | + "cell_type": "code", |
| 209 | + "execution_count": null, |
| 210 | + "id": "e6d78aa5", |
| 211 | + "metadata": {}, |
| 212 | + "outputs": [], |
| 213 | + "source": [ |
| 214 | + "# define a stack name variable\n", |
| 215 | + "STACK_NAME = \"aws-batch-nigms-test1\"" |
| 216 | + ] |
| 217 | + }, |
| 218 | + { |
| 219 | + "cell_type": "code", |
| 220 | + "execution_count": null, |
| 221 | + "id": "fc344828", |
192 | 222 | "metadata": {}, |
| 223 | + "outputs": [], |
| 224 | + "source": [ |
| 225 | + "import boto3\n", |
| 226 | + "# Get account ID and region \n", |
| 227 | + "account_id = boto3.client('sts').get_caller_identity().get('Account')\n", |
| 228 | + "region = boto3.session.Session().region_name" |
| 229 | + ] |
| 230 | + }, |
| 231 | + { |
| 232 | + "cell_type": "code", |
| 233 | + "execution_count": null, |
| 234 | + "id": "6c908d53", |
| 235 | + "metadata": {}, |
| 236 | + "outputs": [], |
193 | 237 | "source": [ |
194 | | - "#### Change the parameters as desired in `aws` profile inside `../denovotrascript/nextflow.config` file:\n", |
195 | | - " - Name of your **AWS Batch Job Queue**\n", |
196 | | - " - AWS region \n", |
197 | | - " - Nextflow work directory\n", |
198 | | - " - Nextflow output directory" |
| 238 | + "# Set variable names \n", |
| 239 | + "# These variables should come from the Intro AWS Batch tutorial (or leave as-is if using the launch stack button)\n", |
| 240 | + "BUCKET_NAME = f\"{STACK_NAME}-batch-bucket-{account_id}\"\n", |
| 241 | + "AWS_QUEUE = f\"{STACK_NAME}-JobQueue\"\n", |
| 242 | + "INPUT_FOLDER = 'nigms-sandbox/nosi-inbremaine-storage/'\n", |
| 243 | + "AWS_REGION = region" |
199 | 244 | ] |
200 | 245 | }, |
201 | 246 | { |
202 | 247 | "cell_type": "markdown", |
203 | | - "id": "abdb13bb", |
| 248 | + "id": "596667bd", |
204 | 249 | "metadata": {}, |
205 | 250 | "source": [ |
206 | | - "### **Step 3:** Install Nextflow" |
| 251 | + "#### Install dependencies\n", |
| 252 | + "Installs Nextflow and Java, which are required to execute the pipeline. In environments like SageMaker, Java is usually pre-installed. But if you're running outside SageMaker (e.g., EC2 or local), you’ll need to manually install it." |
207 | 253 | ] |
208 | 254 | }, |
209 | 255 | { |
|
213 | 259 | "metadata": {}, |
214 | 260 | "outputs": [], |
215 | 261 | "source": [ |
216 | | - "%%capture\n", |
217 | | - "! mamba create -n nextflow -c bioconda nextflow -y\n", |
218 | | - "! mamba install -n nextflow ipykernel -y" |
| 262 | + "# Install Nextflow\n", |
| 263 | + "! mamba install -y -c conda-forge -c bioconda nextflow --quiet" |
219 | 264 | ] |
220 | 265 | }, |
221 | 266 | { |
222 | 267 | "cell_type": "markdown", |
223 | | - "id": "096b76d5", |
| 268 | + "id": "9e08a0d5", |
224 | 269 | "metadata": {}, |
225 | 270 | "source": [ |
226 | | - "<div class=\"alert alert-block alert-danger\">\n", |
227 | | - " <i class=\"fa fa-exclamation-circle\" aria-hidden=\"true\"></i>\n", |
228 | | - " <b>Alert: </b> Remember to change your kernel to <b>conda_nextflow</b> to run nextflow.\n", |
229 | | - "</div>" |
| 271 | + "<details>\n", |
| 272 | + "<summary>Install Java and Nextflow if needed in other systems</summary>\n", |
| 273 | + "If using other system other than AWS SageMaker Notebook, you might need to install java and nextflow using the code below:\n", |
| 274 | + "<br> <i># Install java</i><pre>\n", |
| 275 | + " sudo apt update\n", |
| 276 | + " sudo apt-get install default-jdk -y\n", |
| 277 | + " java -version\n", |
| 278 | + " </pre>\n", |
| 279 | + " <i># Install Nextflow</i><pre>\n", |
| 280 | + " curl https://get.nextflow.io | bash\n", |
| 281 | + " chmod +x nextflow\n", |
| 282 | + " ./nextflow self-update\n", |
| 283 | + " ./nextflow plugin update\n", |
| 284 | + " </pre>\n", |
| 285 | + "</details>" |
| 286 | + ] |
| 287 | + }, |
| 288 | + { |
| 289 | + "cell_type": "markdown", |
| 290 | + "id": "c46757a3", |
| 291 | + "metadata": {}, |
| 292 | + "source": [ |
| 293 | + "# replace batch bucket name in nextflow configuration file\n", |
| 294 | + "! sed -i \"s/aws-batch-nigms-batch-bucket-/$BUCKET_NAME/g\" nextflow.config\n", |
| 295 | + "# replace job queue name in configuration file \n", |
| 296 | + "! sed -i \"s/aws-batch-nigms-JobQueue/$AWS_QUEUE/g\" nextflow.config\n", |
| 297 | + "# replace the region placeholder with the region you are in \n", |
| 298 | + "! sed -i \"s/aws-region/$AWS_REGION/g\" nextflow.config" |
230 | 299 | ] |
231 | 300 | }, |
232 | 301 | { |
233 | 302 | "cell_type": "markdown", |
234 | 303 | "id": "de3d1b9b", |
235 | 304 | "metadata": {}, |
236 | 305 | "source": [ |
237 | | - "### **Step 4:** Run `denovotranscript`" |
| 306 | + "### **Step 4:** Enable AWS Batch for the nextflow script `denovotranscript`" |
238 | 307 | ] |
239 | 308 | }, |
240 | 309 | { |
241 | 310 | "cell_type": "markdown", |
242 | 311 | "id": "8e1541b9-abb6-47c0-aa49-5c1720680376", |
243 | 312 | "metadata": {}, |
244 | 313 | "source": [ |
| 314 | + "Run the pipeline in a cloud-native, serverless manner using AWS Batch. AWS Batch offloads the burden of provisioning and managing compute resources. When you execute this command:\n", |
| 315 | + "- Nextflow uploads tasks to AWS Batch. \n", |
| 316 | + "- AWS Batch pulls the necessary containers.\n", |
| 317 | + "- Each process/task in the pipeline runs as an isolated job in the cloud.\n", |
| 318 | + "\n", |
245 | 319 | "Now we can run `denovotranscript` using the option `annotation_only` run-mode which assumes that the transcriptome has been generated, and will only run the various steps for annotation of the transcripts.\n", |
246 | 320 | "\n", |
247 | 321 | ">This run should take about **5 minutes**" |
|
263 | 337 | "id": "8a0f8dfb-366d-4e0f-af4e-d96f6ee97d34", |
264 | 338 | "metadata": {}, |
265 | 339 | "source": [ |
266 | | - "The output will be arranged in a directory structure in your Amazon S3 bucket. We will download it into our local directory:" |
| 340 | + "The output will be arranged in a directory structure in your Amazon S3 bucket. We will download it into our local directory:\n", |
| 341 | + "<div style=\"border: 1px solid #e57373; padding: 0px; border-radius: 4px;\">\n", |
| 342 | + " <div style=\"background-color: #ffcdd2; padding: 5px; \">\n", |
| 343 | + " <i class=\"fas fa-exclamation-triangle\" style=\"color: #b71c1c;margin-right: 5px;\"></i><a style=\"color: #b71c1c\"><b>Important</b> </a>\n", |
| 344 | + " </div>\n", |
| 345 | + " <p style=\"margin-left: 5px;\">\n", |
| 346 | + "\n", |
| 347 | + " Update \\<Your-Output-Directory-annotation-only> to your local annotation only folder. <br>\n", |
| 348 | + " </p>\n", |
| 349 | + "</div>\n" |
267 | 350 | ] |
268 | 351 | }, |
269 | 352 | { |
|
274 | 357 | "outputs": [], |
275 | 358 | "source": [ |
276 | 359 | "! mkdir -p <Your-Output-Directory-annotation-only>\n", |
277 | | - "! aws s3 cp --recursive s3://<YOUR-BUCKET-NAME>/<Your-Output-Directory-annotation-only>/ ./<Your-Output-Directory-annotation-only>" |
| 360 | + "! aws s3 cp --recursive s3://$BUCKET_NAME/nextflow_output/ ./<Your-Output-Directory-annotation-only>" |
278 | 361 | ] |
279 | 362 | }, |
280 | 363 | { |
|
287 | 370 | "! ls -l ./<Your-Output-Directory-annotation-only>" |
288 | 371 | ] |
289 | 372 | }, |
290 | | - { |
291 | | - "cell_type": "markdown", |
292 | | - "id": "1b3ac17d", |
293 | | - "metadata": {}, |
294 | | - "source": [ |
295 | | - "----\n", |
296 | | - "# Andrea, please update this part" |
297 | | - ] |
298 | | - }, |
299 | 373 | { |
300 | 374 | "cell_type": "markdown", |
301 | 375 | "id": "337b1049", |
|
314 | 388 | "! cat ./onlyAnnRun/output/RUN_INFO.txt" |
315 | 389 | ] |
316 | 390 | }, |
317 | | - { |
318 | | - "cell_type": "markdown", |
319 | | - "id": "df312985", |
320 | | - "metadata": {}, |
321 | | - "source": [ |
322 | | - "---" |
323 | | - ] |
324 | | - }, |
325 | 391 | { |
326 | 392 | "cell_type": "markdown", |
327 | 393 | "id": "4187a790-276c-4bf2-8ce8-2f7985e8c662", |
|
0 commit comments