|
128 | 128 | "! aws s3 ls s3://nigms-sandbox/nosi-inbremaine-storage/resources/seq2/" |
129 | 129 | ] |
130 | 130 | }, |
| 131 | + { |
| 132 | + "cell_type": "markdown", |
| 133 | + "id": "5e8dd0c5", |
| 134 | + "metadata": {}, |
| 135 | + "source": [ |
| 136 | + "***If you have not set up AWS Batch please proceed to Step 2, otherwise proceed to Step 3.***" |
| 137 | + ] |
| 138 | + }, |
131 | 139 | { |
132 | 140 | "cell_type": "markdown", |
133 | 141 | "id": "7a87b0d2", |
134 | 142 | "metadata": {}, |
135 | 143 | "source": [ |
136 | | - "### **Step 2:** AWS Batch Setup\n", |
137 | | - "\n", |
138 | | - "AWS Batch will create the needed permissions, roles and resources to run Nextflow in a serverless manner. You can set up AWS Batch manually or deploy it **automatically** with a stack template. The Launch Stack button below will take you to the cloud formation create stack webpage with the template with required resources already linked. \n", |
| 144 | + "### **Step 2:** Setting up AWS Batch \n", |
139 | 145 | "\n", |
140 | | - "If you prefer to skip manual deployment and deploy automatically in the cloud, click the Launch Stack button below. For a walkthrough of the screens during automatic deployment please click [here](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/HowToLaunchAWSBatch.md). The deployment should take ~5 min and then the resources will be ready for use. \n", |
| 146 | + "AWS Batch manages the provisioning of compute environments (EC2, Fargate), container orchestration, job queues, IAM roles, and permissions. We can deploy a full environment either:\n", |
| 147 | + "- Automatically using a preconfigured AWS CloudFormation stack (**recommended**)\n", |
| 148 | + "- Manually by setting up roles, queues, and buckets\n", |
| 149 | + "The Launch Stack button below will take you to the cloud formation create stack webpage with the template with required resources already linked. \n", |
141 | 150 | "\n", |
142 | | - "[](https://console.aws.amazon.com/cloudformation/home?#/stacks/new?stackName=aws-batch-nigms&templateURL=https://nigms-sandbox.s3.us-east-1.amazonaws.com/cf-templates/AWSBatch_template.yaml)\n", |
| 151 | + "If you prefer to skip manual deployment and deploy automatically in the cloud, click the **Launch Stack** button below. For a walkthrough of the screens during automatic deployment please click [here](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/HowToLaunchAWSBatch.md). The deployment should take ~5 min and then the resources will be ready for use. \n", |
143 | 152 | "\n", |
144 | | - "\n", |
145 | | - "Before beginning this tutorial, if you do not have required roles, policies, permissions or compute environment and would like to **manually** set those up please click [here](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/AWS-Batch-Setup.md) to set that up." |
| 153 | + "[](https://console.aws.amazon.com/cloudformation/home?#/stacks/new?stackName=aws-batch-nigms&templateURL=https://nigms-sandbox.s3.us-east-1.amazonaws.com/cf-templates/AWSBatch_template.yaml )" |
146 | 154 | ] |
147 | 155 | }, |
148 | 156 | { |
149 | 157 | "cell_type": "markdown", |
150 | | - "id": "413ac931", |
| 158 | + "id": "1f55c633", |
151 | 159 | "metadata": {}, |
152 | 160 | "source": [ |
153 | | - "#### Change the parameters as desired in `aws` profile inside `../denovotrascript/nextflow.config` file:\n", |
154 | | - " - Name of your **AWS Batch Job Queue**\n", |
155 | | - " - AWS region \n", |
156 | | - " - Nextflow work directory\n", |
157 | | - " - Nextflow output directory" |
| 161 | + "### **Step 3:** Install dependencies, update paths and create a new S3 Bucket to store input and output files\n", |
| 162 | + "\n", |
| 163 | + "After setting up an AWS CloudFormation stack, we need to let the nextflow workflow to know where are those resrouces by providing the configuration:\n", |
| 164 | + "<div style=\"border: 1px solid #e57373; padding: 0px; border-radius: 4px;\">\n", |
| 165 | + " <div style=\"background-color: #ffcdd2; padding: 5px; \">\n", |
| 166 | + " <i class=\"fas fa-exclamation-triangle\" style=\"color: #b71c1c;margin-right: 5px;\"></i><a style=\"color: #b71c1c\"><b>Important</b> - Customize Required</a>\n", |
| 167 | + " </div>\n", |
| 168 | + " <p style=\"margin-left: 5px;\">\n", |
| 169 | + "After successfull creation of your stack you must attatch a new role to SageMaker to be able to submit batch jobs. Please following the the following steps to change your SageMaker role:<br>\n", |
| 170 | + "<ol> <li>Navigate to your SageMaker AI notebook dashboard (where you initially created and launched your VM)</li> <li>Locate your instance and click the <b>Stop</b> button</li> <li>Once the instance is stopped: <ul> <li>Click <b>Edit</b></li> <li>Scroll to the \"Permissions and encryption\" section</li> <li>Click the IAM role dropdown</li> <li>Select the new role created during stack formation (named something like <b>aws-batch-nigms-SageMakerExecutionRole</b>)</li> </ul> </li> \n", |
| 171 | + "<li>Click <b>Update notebook instance</b> to save your changes</li> \n", |
| 172 | + "<li>After the update completes: <ul> <li>Click <b>Start</b> to relaunch your instance</li> <li>Reconnect to your instance</li> <li>Resume your work from this point</li> </ul> </li> </ol>\n", |
| 173 | + "\n", |
| 174 | + "<b>Warning:</b> Make sure to replace the <b>stack name</b> to the stack that you just created. <code>STACK_NAME = \"your-stack-name-here\"</code>\n", |
| 175 | + " </p>\n", |
| 176 | + "</div>" |
158 | 177 | ] |
159 | 178 | }, |
160 | 179 | { |
161 | | - "cell_type": "markdown", |
162 | | - "id": "1f55c633", |
| 180 | + "cell_type": "code", |
| 181 | + "execution_count": null, |
| 182 | + "id": "0da9939e", |
163 | 183 | "metadata": {}, |
| 184 | + "outputs": [], |
164 | 185 | "source": [ |
165 | | - "### **Step 3:** Install Nextflow" |
| 186 | + "# define a stack name variable\n", |
| 187 | + "STACK_NAME = \"aws-batch-nigms-test1\"" |
166 | 188 | ] |
167 | 189 | }, |
168 | 190 | { |
|
172 | 194 | "metadata": {}, |
173 | 195 | "outputs": [], |
174 | 196 | "source": [ |
175 | | - "%%capture\n", |
176 | | - "! mamba create -n nextflow -c bioconda nextflow -y\n", |
177 | | - "! mamba install -n nextflow ipykernel -y" |
| 197 | + "import boto3\n", |
| 198 | + "# Get account ID and region \n", |
| 199 | + "account_id = boto3.client('sts').get_caller_identity().get('Account')\n", |
| 200 | + "region = boto3.session.Session().region_name" |
| 201 | + ] |
| 202 | + }, |
| 203 | + { |
| 204 | + "cell_type": "code", |
| 205 | + "execution_count": null, |
| 206 | + "id": "b52c37c5", |
| 207 | + "metadata": {}, |
| 208 | + "outputs": [], |
| 209 | + "source": [ |
| 210 | + "# Set variable names \n", |
| 211 | + "# These variables should come from the Intro AWS Batch tutorial (or leave as-is if using the launch stack button)\n", |
| 212 | + "BUCKET_NAME = f\"{STACK_NAME}-batch-bucket-{account_id}\"\n", |
| 213 | + "AWS_QUEUE = f\"{STACK_NAME}-JobQueue\"\n", |
| 214 | + "INPUT_FOLDER = 'nigms-sandbox/nosi-inbremaine-storage/'\n", |
| 215 | + "AWS_REGION = region" |
178 | 216 | ] |
179 | 217 | }, |
180 | 218 | { |
181 | 219 | "cell_type": "markdown", |
182 | | - "id": "bcb1fe5e", |
| 220 | + "id": "8fce8e92", |
183 | 221 | "metadata": {}, |
184 | 222 | "source": [ |
185 | | - "<div class=\"alert alert-block alert-danger\">\n", |
186 | | - " <i class=\"fa fa-exclamation-circle\" aria-hidden=\"true\"></i>\n", |
187 | | - " <b>Alert: </b> Remember to change your kernel to <b>conda_nextflow</b> to run nextflow.\n", |
188 | | - "</div>" |
| 223 | + "#### Install dependencies\n", |
| 224 | + "Installs Nextflow and Java, which are required to execute the pipeline. In environments like SageMaker, Java is usually pre-installed. But if you're running outside SageMaker (e.g., EC2 or local), you’ll need to manually install it." |
| 225 | + ] |
| 226 | + }, |
| 227 | + { |
| 228 | + "cell_type": "code", |
| 229 | + "execution_count": null, |
| 230 | + "id": "b7625b33", |
| 231 | + "metadata": {}, |
| 232 | + "outputs": [], |
| 233 | + "source": [ |
| 234 | + "# Install Nextflow\n", |
| 235 | + "! mamba install -y -c conda-forge -c bioconda nextflow --quiet" |
189 | 236 | ] |
190 | 237 | }, |
191 | 238 | { |
192 | 239 | "cell_type": "markdown", |
193 | | - "id": "72d1a3b8", |
| 240 | + "id": "80b91ef0", |
194 | 241 | "metadata": {}, |
195 | 242 | "source": [ |
196 | | - "### **Step 4:** Run `denovotranscript`" |
| 243 | + "<details>\n", |
| 244 | + "<summary>Install Java and Nextflow if needed in other systems</summary>\n", |
| 245 | + "If using other system other than AWS SageMaker Notebook, you might need to install java and nextflow using the code below:\n", |
| 246 | + "<br> <i># Install java</i><pre>\n", |
| 247 | + " sudo apt update\n", |
| 248 | + " sudo apt-get install default-jdk -y\n", |
| 249 | + " java -version\n", |
| 250 | + " </pre>\n", |
| 251 | + " <i># Install Nextflow</i><pre>\n", |
| 252 | + " curl https://get.nextflow.io | bash\n", |
| 253 | + " chmod +x nextflow\n", |
| 254 | + " ./nextflow self-update\n", |
| 255 | + " ./nextflow plugin update\n", |
| 256 | + " </pre>\n", |
| 257 | + "</details>" |
197 | 258 | ] |
198 | 259 | }, |
199 | 260 | { |
200 | 261 | "cell_type": "code", |
201 | 262 | "execution_count": null, |
202 | | - "id": "ee5985e3-93df-4779-afe1-4464e13bf619", |
| 263 | + "id": "61f28ac4", |
203 | 264 | "metadata": {}, |
204 | 265 | "outputs": [], |
205 | 266 | "source": [ |
206 | | - "! nextflow run main.nf --input test_samplesheet.csv -profile aws --run_mode full" |
| 267 | + "# replace batch bucket name in nextflow configuration file\n", |
| 268 | + "! sed -i \"s/aws-batch-nigms-batch-bucket-/$BUCKET_NAME/g\" ../denovotranscipt/nextflow.config\n", |
| 269 | + "# replace job queue name in configuration file \n", |
| 270 | + "! sed -i \"s/aws-batch-nigms-JobQueue/$AWS_QUEUE/g\" ../denovotranscipt/nextflow.config\n", |
| 271 | + "# replace the region placeholder with the region you are in \n", |
| 272 | + "! sed -i \"s/aws-region/$AWS_REGION/g\" ../denovotranscipt/nextflow.config" |
207 | 273 | ] |
208 | 274 | }, |
209 | 275 | { |
210 | 276 | "cell_type": "markdown", |
211 | | - "id": "0117d994-0502-4a58-b07a-861d254f11e2", |
| 277 | + "id": "72d1a3b8", |
212 | 278 | "metadata": {}, |
213 | 279 | "source": [ |
214 | | - "The beauty and power of using a defined workflow in a management system (such as Nextflow) are that we not only get a defined set of steps that are carried out in the proper order, but we also get a well-structured and concise directory structure that holds all pertinent output." |
| 280 | + "### **Step 4:** Enable AWS Batch for the nextflow script `denovotranscript`" |
215 | 281 | ] |
216 | 282 | }, |
217 | 283 | { |
218 | 284 | "cell_type": "markdown", |
219 | | - "id": "5ad70acb", |
| 285 | + "id": "92fb30de", |
| 286 | + "metadata": {}, |
| 287 | + "source": [ |
| 288 | + "Run the pipeline in a cloud-native, serverless manner using AWS Batch. AWS Batch offloads the burden of provisioning and managing compute resources. When you execute this command:\n", |
| 289 | + "- Nextflow uploads tasks to AWS Batch. \n", |
| 290 | + "- AWS Batch pulls the necessary containers.\n", |
| 291 | + "- Each process/task in the pipeline runs as an isolated job in the cloud.\n", |
| 292 | + "\n", |
| 293 | + "The beauty and power of using a defined workflow in a management system (such as Nextflow) are that we not only get a defined set of steps that are carried out in the proper order, but we also get a well-structured and concise directory structure that holds all pertinent output.\n" |
| 294 | + ] |
| 295 | + }, |
| 296 | + { |
| 297 | + "cell_type": "code", |
| 298 | + "execution_count": null, |
| 299 | + "id": "ee5985e3-93df-4779-afe1-4464e13bf619", |
220 | 300 | "metadata": {}, |
| 301 | + "outputs": [], |
221 | 302 | "source": [ |
222 | | - "---\n", |
223 | | - "# Andrea, please update the rest for result" |
| 303 | + "! nextflow run main.nf --input test_samplesheet.csv -profile aws --run_mode full" |
224 | 304 | ] |
225 | 305 | }, |
226 | 306 | { |
|
238 | 318 | "metadata": {}, |
239 | 319 | "outputs": [], |
240 | 320 | "source": [ |
241 | | - "! aws s3 ls s3://<YOUR-BUCKET-NAME>/<Your-Output-Directory>/" |
| 321 | + "! aws s3 ls s3://$BUCKET_NAME/nextflow_output/" |
242 | 322 | ] |
243 | 323 | }, |
244 | 324 | { |
|
386 | 466 | "id": "909f6112", |
387 | 467 | "metadata": {}, |
388 | 468 | "source": [ |
389 | | - "## Conclusion" |
| 469 | + "## Conclusion: Why Use AWS Batch?\n", |
| 470 | + "<table border=\"1\" cellpadding=\"8\" cellspacing=\"0\">\n", |
| 471 | + " <thead>\n", |
| 472 | + " <tr>\n", |
| 473 | + " <th>Benefit</th>\n", |
| 474 | + " <th>Explanation</th>\n", |
| 475 | + " </tr>\n", |
| 476 | + " </thead>\n", |
| 477 | + " <tbody>\n", |
| 478 | + " <tr>\n", |
| 479 | + " <td><strong>Scalability</strong></td>\n", |
| 480 | + " <td>Process large MeRIP-seq datasets with multiple jobs in parallel</td>\n", |
| 481 | + " </tr>\n", |
| 482 | + " <tr>\n", |
| 483 | + " <td><strong>Reproducibility</strong></td>\n", |
| 484 | + " <td>Ensures the exact same Docker containers and config are used every time</td>\n", |
| 485 | + " </tr>\n", |
| 486 | + " <tr>\n", |
| 487 | + " <td><strong>Ease of Management</strong></td>\n", |
| 488 | + " <td>No need to manually manage EC2 instances or storage mounts</td>\n", |
| 489 | + " </tr>\n", |
| 490 | + " <tr>\n", |
| 491 | + " <td><strong>Integration with S3</strong></td>\n", |
| 492 | + " <td>Input/output seamlessly handled via S3 buckets</td>\n", |
| 493 | + " </tr>\n", |
| 494 | + " </tbody>\n", |
| 495 | + "</table>\n", |
| 496 | + "\n", |
| 497 | + "Running on AWS Batch is ideal when your dataset grows beyond what your local notebook or server can handleor when you want reproducible, cloud-native workflows that are easier to scale, share, and manage." |
390 | 498 | ] |
391 | 499 | }, |
392 | 500 | { |
393 | 501 | "cell_type": "markdown", |
394 | 502 | "id": "b68484f3", |
395 | 503 | "metadata": {}, |
396 | 504 | "source": [ |
397 | | - "## Clean Up\n", |
398 | | - "\n", |
399 | | - "Shut down your instance if you are finished." |
| 505 | + "## Clean Up the AWS Environment\n", |
| 506 | + "\n", |
| 507 | + "Once you've successfully run your analysis and downloaded the results, it's a good idea to clean up unused resources to avoid unnecessary charges.\n", |
| 508 | + "\n", |
| 509 | + "#### Recommended Cleanup Steps:\n", |
| 510 | + "\n", |
| 511 | + "- **Delete Output Files from S3 (Optional)** \n", |
| 512 | + " If you've downloaded your results locally and no longer need them stored in the cloud.\n", |
| 513 | + "- **Delete the S3 Bucket (Optional)** \n", |
| 514 | + " To remove the entire bucket (only do this if you're sure!)\n", |
| 515 | + "- **Shut Down AWS Batch Resources (Optional but Recommended):** \n", |
| 516 | + " If you used a CloudFormation stack to set up AWS Batch, you can delete all associated resources in one step (⚠️ Note: Deleting the stack will also remove IAM roles and compute environments created by the template.):\n", |
| 517 | + " + Go to the <a href=\"https://console.aws.amazon.com/cloudformation/\">AWS CloudFormation Console</a>\n", |
| 518 | + " + Select your stack (e.g., <code>aws-batch-nigms-test1</code>)\n", |
| 519 | + " + Click Delete\n", |
| 520 | + " + Wait for all resources (compute environments, roles, queues) to be removed\n", |
| 521 | + " \n", |
| 522 | + "<div style=\"border: 1px solid #659078; padding: 0px; border-radius: 4px;\">\n", |
| 523 | + " <div style=\"background-color: #d4edda; padding: 5px; font-weight: bold;\">\n", |
| 524 | + " <i class=\"fas fa-lightbulb\" style=\"color: #0e4628;margin-right: 5px;\"></i><a style=\"color: #0e4628\">Tips</a>\n", |
| 525 | + " </div>\n", |
| 526 | + " <p style=\"margin-left: 5px;\">\n", |
| 527 | + "It’s always good practice to periodically review your <b>EC2 instances</b>, <b>ECR containers</b>, <b>S3 storage</b>, and <b>CloudWatch logs</b> to ensure no stray resources are incurring charges.\n", |
| 528 | + " </p>\n", |
| 529 | + "</div>" |
400 | 530 | ] |
401 | 531 | } |
402 | 532 | ], |
|
0 commit comments