Updated submodule 2

aolveraNIH · aolveraNIH · commit 240fbe756121 · 2025-06-09T11:00:56.000-05:00
diff --git a/AWS/Submodule_2_annotation_only.ipynb b/AWS/Submodule_2_annotation_only.ipynb
@@ -174,36 +174,82 @@
    "id": "64228197",
    "metadata": {},
    "source": [
-    "### **Step 2:** AWS Batch Setup\n",
+    "## Get Started\n",
+    "### **Step 2:** Setting up AWS Batch\n",
     "\n",
-    "AWS Batch will create the needed permissions, roles and resources to run Nextflow in a serverless manner. You can set up AWS Batch manually or deploy it **automatically** with a stack template. The Launch Stack button below will take you to the cloud formation create stack webpage with the template with required resources already linked. \n",
+    "AWS Batch manages the provisioning of compute environments (EC2, Fargate), container orchestration, job queues, IAM roles, and permissions. We can deploy a full environment either:\n",
+    "- Automatically using a preconfigured AWS CloudFormation stack (**recommended**)\n",
+    "- Manually by setting up roles, queues, and buckets\n",
+    "The Launch Stack button below will take you to the cloud formation create stack webpage with the template with required resources already linked. \n",
     "\n",
-    "If you prefer to skip manual deployment and deploy automatically in the cloud, click the Launch Stack button below. For a walkthrough of the screens during automatic deployment please click [here](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/HowToLaunchAWSBatch.md). The deployment should take ~5 min and then the resources will be ready for use. \n",
+    "If you prefer to skip manual deployment and deploy automatically in the cloud, click the **Launch Stack** button below. For a walkthrough of the screens during automatic deployment please click [here](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/HowToLaunchAWSBatch.md). The deployment should take ~5 min and then the resources will be ready for use. \n",
     "\n",
-    "[![Launch Stack](../images/LaunchStack.jpg)](https://console.aws.amazon.com/cloudformation/home?#/stacks/new?stackName=aws-batch-nigms&templateURL=https://nigms-sandbox.s3.us-east-1.amazonaws.com/cf-templates/AWSBatch_template.yaml)\n",
+    "[![Launch Stack](../images/LaunchStack.jpg)](https://console.aws.amazon.com/cloudformation/home?#/stacks/new?stackName=aws-batch-nigms&templateURL=https://nigms-sandbox.s3.us-east-1.amazonaws.com/cf-templates/AWSBatch_template.yaml )\n",
     "\n",
+    "### **Step 3:** Install dependencies, update paths and create a new S3 Bucket to store input and output files\n",
     "\n",
-    "Before beginning this tutorial, if you do not have required roles, policies, permissions or compute environment and would like to **manually** set those up please click [here](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/AWS-Batch-Setup.md) to set that up."
+    "After setting up an AWS CloudFormation stack, we need to let the nextflow workflow to know where are those resrouces by providing the configuration:\n",
+    "<div style=\"border: 1px solid #e57373; padding: 0px; border-radius: 4px;\">\n",
+    "  <div style=\"background-color: #ffcdd2; padding: 5px; \">\n",
+    "    <i class=\"fas fa-exclamation-triangle\" style=\"color: #b71c1c;margin-right: 5px;\"></i><a style=\"color: #b71c1c\"><b>Important</b> - Customize Required</a>\n",
+    "  </div>\n",
+    "  <p style=\"margin-left: 5px;\">\n",
+    "After successfull creation of your stack you must attatch a new role to SageMaker to be able to submit batch jobs. Please following the the following steps to change your SageMaker role:<br>\n",
+    "<ol> <li>Navigate to your SageMaker AI notebook dashboard (where you initially created and launched your VM)</li> <li>Locate your instance and click the <b>Stop</b> button</li> <li>Once the instance is stopped: <ul> <li>Click <b>Edit</b></li> <li>Scroll to the \"Permissions and encryption\" section</li> <li>Click the IAM role dropdown</li> <li>Select the new role created during stack formation (named something like <b>aws-batch-nigms-SageMakerExecutionRole</b>)</li> </ul> </li> \n",
+    "<li>Click <b>Update notebook instance</b> to save your changes</li> \n",
+    "<li>After the update completes: <ul> <li>Click <b>Start</b> to relaunch your instance</li> <li>Reconnect to your instance</li> <li>Resume your work from this point</li> </ul> </li> </ol>\n",
+    "\n",
+    "<b>Warning:</b> Make sure to replace the <b>stack name</b> to the stack that you just created. <code>STACK_NAME = \"your-stack-name-here\"</code>\n",
+    "  </p>\n",
+    "</div>"
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "4506a617",
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e6d78aa5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# define a stack name variable\n",
+    "STACK_NAME = \"aws-batch-nigms-test1\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fc344828",
    "metadata": {},
+   "outputs": [],
+   "source": [
+    "import boto3\n",
+    "# Get account ID and region \n",
+    "account_id = boto3.client('sts').get_caller_identity().get('Account')\n",
+    "region = boto3.session.Session().region_name"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6c908d53",
+   "metadata": {},
+   "outputs": [],
    "source": [
-    "#### Change the parameters as desired in `aws` profile inside `../denovotrascript/nextflow.config` file:\n",
-    " - Name of your **AWS Batch Job Queue**\n",
-    " - AWS region \n",
-    " - Nextflow work directory\n",
-    " - Nextflow output directory"
+    "# Set variable names \n",
+    "# These variables should come from the Intro AWS Batch tutorial (or leave as-is if using the launch stack button)\n",
+    "BUCKET_NAME = f\"{STACK_NAME}-batch-bucket-{account_id}\"\n",
+    "AWS_QUEUE = f\"{STACK_NAME}-JobQueue\"\n",
+    "INPUT_FOLDER = 'nigms-sandbox/nosi-inbremaine-storage/'\n",
+    "AWS_REGION = region"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "abdb13bb",
+   "id": "596667bd",
    "metadata": {},
    "source": [
-    "### **Step 3:** Install Nextflow"
+    "#### Install dependencies\n",
+    "Installs Nextflow and Java, which are required to execute the pipeline. In environments like SageMaker, Java is usually pre-installed. But if you're running outside SageMaker (e.g., EC2 or local), you’ll need to manually install it."
    ]
   },
   {
@@ -213,35 +259,63 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "%%capture\n",
-    "! mamba create  -n nextflow -c bioconda nextflow -y\n",
-    "! mamba install -n nextflow ipykernel -y"
+    "# Install Nextflow\n",
+    "! mamba install -y -c conda-forge -c bioconda nextflow --quiet"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "096b76d5",
+   "id": "9e08a0d5",
    "metadata": {},
    "source": [
-    "<div class=\"alert alert-block alert-danger\">\n",
-    "    <i class=\"fa fa-exclamation-circle\" aria-hidden=\"true\"></i>\n",
-    "    <b>Alert: </b> Remember to change your kernel to <b>conda_nextflow</b> to run nextflow.\n",
-    "</div>"
+    "<details>\n",
+    "<summary>Install Java and Nextflow if needed in other systems</summary>\n",
+    "If using other system other than AWS SageMaker Notebook, you might need to install java and nextflow using the code below:\n",
+    "<br> <i># Install java</i><pre>\n",
+    "    sudo apt update\n",
+    "    sudo apt-get install default-jdk -y\n",
+    "    java -version\n",
+    "    </pre>\n",
+    "    <i># Install Nextflow</i><pre>\n",
+    "    curl https://get.nextflow.io | bash\n",
+    "    chmod +x nextflow\n",
+    "    ./nextflow self-update\n",
+    "    ./nextflow plugin update\n",
+    "    </pre>\n",
+    "</details>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c46757a3",
+   "metadata": {},
+   "source": [
+    "# replace batch bucket name in nextflow configuration file\n",
+    "! sed -i \"s/aws-batch-nigms-batch-bucket-/$BUCKET_NAME/g\" nextflow.config\n",
+    "# replace job queue name in configuration file \n",
+    "! sed -i \"s/aws-batch-nigms-JobQueue/$AWS_QUEUE/g\" nextflow.config\n",
+    "# replace the region placeholder with the region you are in \n",
+    "! sed -i \"s/aws-region/$AWS_REGION/g\" nextflow.config"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "de3d1b9b",
    "metadata": {},
    "source": [
-    "### **Step 4:** Run `denovotranscript`"
+    "### **Step 4:** Enable AWS Batch for the nextflow script `denovotranscript`"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "8e1541b9-abb6-47c0-aa49-5c1720680376",
    "metadata": {},
    "source": [
+    "Run the pipeline in a cloud-native, serverless manner using AWS Batch. AWS Batch offloads the burden of provisioning and managing compute resources. When you execute this command:\n",
+    "- Nextflow uploads tasks to AWS Batch. \n",
+    "- AWS Batch pulls the necessary containers.\n",
+    "- Each process/task in the pipeline runs as an isolated job in the cloud.\n",
+    "\n",
     "Now we can run `denovotranscript` using the option `annotation_only` run-mode which assumes that the transcriptome has been generated, and will only run the various steps for annotation of the transcripts.\n",
     "\n",
     ">This run should take about **5 minutes**"
@@ -263,7 +337,16 @@
    "id": "8a0f8dfb-366d-4e0f-af4e-d96f6ee97d34",
    "metadata": {},
    "source": [
-    "The output will be arranged in a directory structure in your Amazon S3 bucket. We will download it into our local directory:"
+    "The output will be arranged in a directory structure in your Amazon S3 bucket. We will download it into our local directory:\n",
+    "<div style=\"border: 1px solid #e57373; padding: 0px; border-radius: 4px;\">\n",
+    "  <div style=\"background-color: #ffcdd2; padding: 5px; \">\n",
+    "    <i class=\"fas fa-exclamation-triangle\" style=\"color: #b71c1c;margin-right: 5px;\"></i><a style=\"color: #b71c1c\"><b>Important</b> </a>\n",
+    "  </div>\n",
+    "  <p style=\"margin-left: 5px;\">\n",
+    "\n",
+    "  Update \\<Your-Output-Directory-annotation-only> to your local annotation only folder. <br>\n",
+    "  </p>\n",
+    "</div>\n"
    ]
   },
   {
@@ -274,7 +357,7 @@
    "outputs": [],
    "source": [
     "! mkdir -p <Your-Output-Directory-annotation-only>\n",
-    "! aws s3 cp --recursive s3://<YOUR-BUCKET-NAME>/<Your-Output-Directory-annotation-only>/ ./<Your-Output-Directory-annotation-only>"
+    "! aws s3 cp --recursive s3://$BUCKET_NAME/nextflow_output/ ./<Your-Output-Directory-annotation-only>"
    ]
   },
   {
@@ -287,15 +370,6 @@
     "! ls -l ./<Your-Output-Directory-annotation-only>"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "1b3ac17d",
-   "metadata": {},
-   "source": [
-    "----\n",
-    "# Andrea, please update this part"
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "337b1049",
@@ -314,14 +388,6 @@
     "! cat ./onlyAnnRun/output/RUN_INFO.txt"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "df312985",
-   "metadata": {},
-   "source": [
-    "---"
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "4187a790-276c-4bf2-8ce8-2f7985e8c662",
diff --git a/denovotranscript/nextflow.config b/denovotranscript/nextflow.config
@@ -24,6 +24,14 @@ params {
     qc_only                    = false
     skip_assembly              = false
 
+    // AWS parameters
+    awsqueue                   = 'aws-batch-nigms-JobQueue'
+    awsregion                  = 'aws-region'
+    awsworkdir                 = 's3://aws-batch-nigms-batch-bucket-/nextflow_env/'
+    outdir                     = 's3://aws-batch-nigms-batch-bucket-/nextflow_output/'
+    awscli_path                = '/home/ec2-user/miniconda/bin/aws'
+    aws_execrole               = 'ExecutionRole'
+
     // Trimming options
     adapter_fasta              = null
     save_trimmed_fail          = false
@@ -107,15 +115,16 @@ profiles {
     aws {
     process {
             executor = 'awsbatch'
-            queue = 'aws-batch-nigms-JobQueue'                           // Name of your Job queue            
+            queue = params.awsqueue                           // Name of your Job queue 
+            container = 'quay.io/nf-core/ubuntu:22.04'           
     }
-    fusion.enabled = true
-    wave.enabled = true
-    aws.region = 'us-east-1'                                             // YOUR AWS REGION
-
-    workDir = 's3://<YOUR-BUCKET-NAME>/<Your-Work-Directory>/'          // Path of your working directory
-    params.outdir = 's3://<YOUR-BUCKET-NAME>/<Your-Output-Directory>/'   // Path of your output directory
-
+      workDir = params.awsworkdir        // Path of your working directory
+      outdir = params.outdir   // Path of your output directory
+      fusion.enabled = false
+      wave.enabled = false
+      // Give path to where aws is installed 
+      aws.batch.cliPath = params.awscli_path        
+      aws.region = params.awsregion                                            // YOUR AWS REGION
     
     }
     gbatch { 
@@ -126,7 +135,7 @@ profiles {
 	    process.machineType = 'n2-highmem-48'
         
         workDir = 'gs://<YOUR-BUCKET-NAME>/<Your-Work-Directory>/'           // Path of your working directory
-        params.outdir = 'gs://<YOUR-BUCKET-NAME>/<Your-Output-Directory>/'   // Path of your output directory
+        outdir = 'gs://<YOUR-BUCKET-NAME>/<Your-Output-Directory>/'   // Path of your output directory
     }
 
     debug {