|
1 | 1 | # User guide |
2 | 2 |
|
| 3 | +## Introduction |
| 4 | + |
| 5 | +Welcome to the user documentation for the ELIXIR Cloud & AAI ecosystem. With |
| 6 | +this powerful set of services, you'll be able to easily access cloud resources |
| 7 | +and send analysis pipelines to your data with just a few simple commands. |
| 8 | +Imagine being able to run complex genomic analyses on massive datasets without |
| 9 | +worrying about infrastructure limitations or having to manage complex server |
| 10 | +environments. The GA4GH Cloud APIs give you access to powerful tools and |
| 11 | +resources that allow you to focus on your research goals, not IT. |
| 12 | + |
| 13 | +The GA4GH (Global Alliance for Genomics and Health) cloud [APIs][ga4gh-cloud] |
| 14 | +are a set of standard APIs that provide a common interface for accessing |
| 15 | +genomic data and tools across different cloud providers. These APIs are |
| 16 | +essential for enabling genomic data sharing and collaboration, and they have |
| 17 | +been adopted by major cloud providers such as Google Cloud Platform, Microsoft |
| 18 | +Azure, and Amazon Web Services. In this documentation, we'll cover four main |
| 19 | +GA4GH APIs that you'll be using: the Workflow Execution Service |
| 20 | +([WES][ga4gh-wes]), the Task Execution Service ([TES][ga4gh-tes]), the Data |
| 21 | +Repository Service ([DRS][ga4gh-drs]), and the Tool Registry Service |
| 22 | +([TRS][ga4gh-trs]). The WES API allows you to define and execute workflows, |
| 23 | +while the TES API allows you to execute individual tasks within those |
| 24 | +workflows. The DRS API provides a way to access and download genomic data, and |
| 25 | +the TRS API enables the discovery of genomic analysis tools. |
| 26 | + |
| 27 | +Whether you are a bioinformatician or a data scientist, this documentation will |
| 28 | +provide you with all the information you need to start using ELIXIR's GA4GH |
| 29 | +cloud services ecosystem and harness the power of cloud computing for your |
| 30 | +genomic data analysis needs. Let's get started! |
| 31 | + |
| 32 | +## ELIXIR Cloud & AAI deployments |
| 33 | + |
| 34 | +The ELIXIR Cloud & AAI group manages different services and appliocations as |
| 35 | +part of the ELIXIR cloud framework. Currently, these services are temporarily |
| 36 | +listed in a dedicated [services list applications][elixir-cloud-services]. In |
| 37 | +the mid-term, all services instances will be registered in the [ELIXIR Cloud |
| 38 | +Registry][elixir-cloud-registry], an implementation of the [GA4GH Service |
| 39 | +Registry API][ga4gh-service-registry]. |
| 40 | + |
| 41 | +## Task Execution Service (TES) |
| 42 | + |
| 43 | +The GA4GH [TES][ga4gh-tes] specification is a standard interface that enables |
| 44 | +interoperability between workflow management systems and execution engines. The |
| 45 | +TES specification provides a uniform way to submit and monitor tasks to any |
| 46 | +execution engine that implements the specification, allowing users to easily |
| 47 | +switch between workflow management systems or execution engines without |
| 48 | +rewriting their workflows. Typical use cases are |
| 49 | + |
| 50 | +- Scenario 1: A researcher wants to run a workflow locally. The workflow |
| 51 | + contains some resource-intensive steps, such as requirements for GPUs or many |
| 52 | + cores. Using TES as a backend, the researcher can execute the workflow |
| 53 | + locally and also send the resource-intensive tasks to cloud servers for |
| 54 | + execution. |
| 55 | +- Scenario 2: A researcher wants to run a workflow that involves processing |
| 56 | + data that is stored in cloud locations. Using TES would allow individual |
| 57 | + tasks to be sent to the compute locations associated with each storage |
| 58 | + location. This may be relevant if the data provider does not allow files to |
| 59 | + be downloaded to a central location or if it is not technically feasible. |
| 60 | + |
| 61 | +The TES specification defines a HTTP API for submitting and monitoring tasks |
| 62 | +that includes endpoints for creating, querying, updating, and canceling tasks. |
| 63 | +Tasks are defined as JSON objects that include input and output files, a |
| 64 | +command to execute, and any environment variables or resources required by the |
| 65 | +task. The TES specification also includes mechanisms for handling task |
| 66 | +dependencies and retrying failed tasks. Popular TES implementations are |
| 67 | +[Funnel][funnel] and [TESK][tesk]. |
| 68 | + |
| 69 | +Several popular workflow management systems, including [cwl-tes][cwl-tes], |
| 70 | +[Snakemake][snakemake] and [Nextflow][nextflow], have implemented the TES |
| 71 | +specification, allowing users to easily run their workflows on any execution |
| 72 | +engine that supports TES. |
| 73 | + |
| 74 | +### Snakemake |
| 75 | + |
| 76 | +Snakemake supports TES v1.0 since version 5.28.0, as described in the |
| 77 | +[Snakemake documentation][snakemake-docs]. Snakemake executes individual tasks |
| 78 | +as separate workflows that execute only one or a few rules. When using TES, it |
| 79 | +is recommended to use additional remote storage to store input and output |
| 80 | +files. By default, Snakemake TES tasks are executed using the official |
| 81 | +Snakemake container image in the same version as the original Snakemake call. |
| 82 | +To use specific tools, conda environments should be appended to the rules. A |
| 83 | +demo workflow is available |
| 84 | +[here][elixir-cloud-demo-smk]. |
| 85 | + |
| 86 | +### CWL-tes |
| 87 | + |
| 88 | +A demo workflow is available [here][elixir-cloud-demo-cwl]. |
| 89 | + |
| 90 | +### Nextflow |
| 91 | + |
| 92 | +!!! warning "Under construction" |
| 93 | + More info coming soon... |
| 94 | + |
| 95 | +## Workflow Execution Service (WES) |
| 96 | + |
| 97 | +The GA4GH [WES][ga4gh-wes] is a standard specification protocol for executing |
| 98 | +and monitoring bioinformatics workflows. It allows researchers to easily |
| 99 | +execute and manage complex analysis pipelines across multiple computing |
| 100 | +platforms and institutions. The WES specification provides a unified API for |
| 101 | +describing workflow inputs and outputs, monitoring job status and progress, and |
| 102 | +managing data transfers. With this specification, users can build scalable, |
| 103 | +reproducible, and interoperable genomics workflows, enabling collaboration |
| 104 | +across institutions and improving data sharing. Two use cases for the GA4GH WES |
| 105 | +specification are: |
| 106 | + |
| 107 | +- Scenario 1: A researcher wants to analyze a large dataset of genomic data |
| 108 | + using a specific analysis pipeline. With the WES specification, the |
| 109 | + researcher can easily define the inputs and parameters for the pipeline, |
| 110 | + select a computing platform that meets their requirements, and submit the job |
| 111 | + for execution. They can then monitor the progress of the job and receive |
| 112 | + notifications when the job is complete. This allows the researcher to focus |
| 113 | + on analyzing the results rather than managing the underlying infrastructure. |
| 114 | + |
| 115 | +- Scenario 2: A clinical laboratory needs to process patient samples for |
| 116 | + genetic testing. The laboratory can use the WES specification to define the |
| 117 | + analysis pipeline and integrate it with its LIMS. This allows the laboratory |
| 118 | + to automate the processing of samples, reducing errors and turnaround time. |
| 119 | + |
| 120 | +## Data Repository Service (DRS) |
| 121 | + |
| 122 | +!!! warning "Under construction" |
| 123 | + More info coming soon... |
| 124 | + |
| 125 | +## Tool Registry Service (TRS) |
| 126 | + |
3 | 127 | !!! warning "Under construction" |
4 | 128 | More info coming soon... |
0 commit comments