GDC Workflow Runner

Overview

GDC workflows are written in Common Workflow Language (CWL), and can be found in the NCI-GDC github organisation
GDC workflows are used for production with the GDC Pipeline Automation System (GPAS). For the 4 workflows that needs to be tested, we created external user entrypoints that can be used independently without GPAS. Check README in each repo for more details.
- DNA alignment
  - To convert user submitted DNA-Seq (WGS, WXS) BAM files into a GDC re-alignment BAM file.
  - Some other files such as BAI file, and alignment metrics are also generated.
- WGS variant calling
  - To accept a pair of tumor and normal WGS BAM files, and derive somatic mutation in VCF/ TSV/ PEDPE, and other outputs.
- WXS variant calling
  - To accept a pair of tumor and normal WXS BAM files, and derive somatic mutations in VCF, and other outputs.
- RNA alignment
  - To accept BAM or FASTQ inputs, and derive 3 different BAMs, quantification TSV, spliceJunction TSV, and other outputs.
GDC workflows load dockers. All external dockers are public, and internal dockers are hosted in quay.io. We have created a quay group to share the required dockers to the APS team for testing purposes. (Will require quay id of AWP team members to add into this group)
GDC workflows require input molecular files. Stored in the uchig-genomics-pipeline-us-east-1 s3 bucket.
GDC workflows require other reference files (such as human genome sequence). Also stored in the uchig-genomics-pipeline-us-east-1 bucket.

Figure 1: Overview of GDC workflow

First workflow that we will run will be a DNA-Seq Alignment workflow on a 2.5Gb WGS bam file.

Prereqs

EC2 instance resources depend on the type of workflow running and the size of the input file. In this(We used c5d.4xlarge):
- cpus > 4
- ram > 12 Gb
- disk space > 50Gb
Access to gdc-dnaseq-cwl workflow in github
Access to uchig-genomics-pipeline-us-east-1 buckets.
Requirements on the instance:
- awscli
- docker
- Access to quay (for docker images)
- python
- cwltool
- nodejs

We have checked in a chef cookbook (gpas-worker) that can be used to build an AMI that will have all the requirements baked in. You can find the instructions here.

Running the workflow

Download requirements

Pull the required repositories.

The dna-seq alignment workflow

git clone -b feat/BINF-309 git@github.com:NCI-GDC/gdc-dnaseq-cwl.git

Scripts to run the workflow

git clone git@github.com:NCI-GDC/gpas-aws-workflow-runner.git

cd gpas-aws-workflow-runner/workflows/
./download-input-files.sh

Pack the cwlworkflow into a json. We use this internally to pass it as a payload.

./pack-workflow.sh /path/to/gdc-dnaseq-cwl/workflows/main/gdc_dnaseq_main_workflow.cwl

Download the input bam file and its index file.

aws s3 cp s3://uchig-genomics-pipeline-us-east-1/bioinformatics_scratch/shenglai/binf389/COLO-829.bam .

Edit WGS-hello-world.input.json to update the placeholder of the input and reference files.

Run workflow

Run the script in a directory where you want to store the output file.

$ df -h /mnt
/dev/nvme0n1    366G   57G  310G  16% /mnt

cd /mnt/SCRATCH

Run the script

/home/ubuntu/gpas-aws-workflow-runner/workflows/run-workflow.sh

Tasks

DNA-Seq WGS hello world

DNA-Seq WGS

DNA-Seq WXS

RNA-Seq

DNA-Seq WGS Sanger variant calling

DNA-Seq WXS somatic variant calling

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
assets		assets
packer		packer
reports		reports
workflows		workflows
.pre-commit-config.yaml		.pre-commit-config.yaml
.secrets.baseline		.secrets.baseline
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GDC Workflow Runner

Overview

Prereqs

Running the workflow

Download requirements

Run workflow

Tasks

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GDC Workflow Runner

Overview

Prereqs

Running the workflow

Download requirements

Run workflow

Tasks

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages