Skip to content

Commit 276b506

Browse files
uniquegsvedziok
andauthored
docs(user): add first draft (#8)
Co-authored-by: Sven Twardziok <sven.twardziok@charite.de>
1 parent 4a9fcc5 commit 276b506

3 files changed

Lines changed: 141 additions & 0 deletions

File tree

docs/guides/guide-user/index.md

Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,128 @@
11
# User guide
22

3+
## Introduction
4+
5+
Welcome to the user documentation for the ELIXIR Cloud & AAI ecosystem. With
6+
this powerful set of services, you'll be able to easily access cloud resources
7+
and send analysis pipelines to your data with just a few simple commands.
8+
Imagine being able to run complex genomic analyses on massive datasets without
9+
worrying about infrastructure limitations or having to manage complex server
10+
environments. The GA4GH Cloud APIs give you access to powerful tools and
11+
resources that allow you to focus on your research goals, not IT.
12+
13+
The GA4GH (Global Alliance for Genomics and Health) cloud [APIs][ga4gh-cloud]
14+
are a set of standard APIs that provide a common interface for accessing
15+
genomic data and tools across different cloud providers. These APIs are
16+
essential for enabling genomic data sharing and collaboration, and they have
17+
been adopted by major cloud providers such as Google Cloud Platform, Microsoft
18+
Azure, and Amazon Web Services. In this documentation, we'll cover four main
19+
GA4GH APIs that you'll be using: the Workflow Execution Service
20+
([WES][ga4gh-wes]), the Task Execution Service ([TES][ga4gh-tes]), the Data
21+
Repository Service ([DRS][ga4gh-drs]), and the Tool Registry Service
22+
([TRS][ga4gh-trs]). The WES API allows you to define and execute workflows,
23+
while the TES API allows you to execute individual tasks within those
24+
workflows. The DRS API provides a way to access and download genomic data, and
25+
the TRS API enables the discovery of genomic analysis tools.
26+
27+
Whether you are a bioinformatician or a data scientist, this documentation will
28+
provide you with all the information you need to start using ELIXIR's GA4GH
29+
cloud services ecosystem and harness the power of cloud computing for your
30+
genomic data analysis needs. Let's get started!
31+
32+
## ELIXIR Cloud & AAI deployments
33+
34+
The ELIXIR Cloud & AAI group manages different services and appliocations as
35+
part of the ELIXIR cloud framework. Currently, these services are temporarily
36+
listed in a dedicated [services list applications][elixir-cloud-services]. In
37+
the mid-term, all services instances will be registered in the [ELIXIR Cloud
38+
Registry][elixir-cloud-registry], an implementation of the [GA4GH Service
39+
Registry API][ga4gh-service-registry].
40+
41+
## Task Execution Service (TES)
42+
43+
The GA4GH [TES][ga4gh-tes] specification is a standard interface that enables
44+
interoperability between workflow management systems and execution engines. The
45+
TES specification provides a uniform way to submit and monitor tasks to any
46+
execution engine that implements the specification, allowing users to easily
47+
switch between workflow management systems or execution engines without
48+
rewriting their workflows. Typical use cases are
49+
50+
- Scenario 1: A researcher wants to run a workflow locally. The workflow
51+
contains some resource-intensive steps, such as requirements for GPUs or many
52+
cores. Using TES as a backend, the researcher can execute the workflow
53+
locally and also send the resource-intensive tasks to cloud servers for
54+
execution.
55+
- Scenario 2: A researcher wants to run a workflow that involves processing
56+
data that is stored in cloud locations. Using TES would allow individual
57+
tasks to be sent to the compute locations associated with each storage
58+
location. This may be relevant if the data provider does not allow files to
59+
be downloaded to a central location or if it is not technically feasible.
60+
61+
The TES specification defines a HTTP API for submitting and monitoring tasks
62+
that includes endpoints for creating, querying, updating, and canceling tasks.
63+
Tasks are defined as JSON objects that include input and output files, a
64+
command to execute, and any environment variables or resources required by the
65+
task. The TES specification also includes mechanisms for handling task
66+
dependencies and retrying failed tasks. Popular TES implementations are
67+
[Funnel][funnel] and [TESK][tesk].
68+
69+
Several popular workflow management systems, including [cwl-tes][cwl-tes],
70+
[Snakemake][snakemake] and [Nextflow][nextflow], have implemented the TES
71+
specification, allowing users to easily run their workflows on any execution
72+
engine that supports TES.
73+
74+
### Snakemake
75+
76+
Snakemake supports TES v1.0 since version 5.28.0, as described in the
77+
[Snakemake documentation][snakemake-docs]. Snakemake executes individual tasks
78+
as separate workflows that execute only one or a few rules. When using TES, it
79+
is recommended to use additional remote storage to store input and output
80+
files. By default, Snakemake TES tasks are executed using the official
81+
Snakemake container image in the same version as the original Snakemake call.
82+
To use specific tools, conda environments should be appended to the rules. A
83+
demo workflow is available
84+
[here][elixir-cloud-demo-smk].
85+
86+
### CWL-tes
87+
88+
A demo workflow is available [here][elixir-cloud-demo-cwl].
89+
90+
### Nextflow
91+
92+
!!! warning "Under construction"
93+
More info coming soon...
94+
95+
## Workflow Execution Service (WES)
96+
97+
The GA4GH [WES][ga4gh-wes] is a standard specification protocol for executing
98+
and monitoring bioinformatics workflows. It allows researchers to easily
99+
execute and manage complex analysis pipelines across multiple computing
100+
platforms and institutions. The WES specification provides a unified API for
101+
describing workflow inputs and outputs, monitoring job status and progress, and
102+
managing data transfers. With this specification, users can build scalable,
103+
reproducible, and interoperable genomics workflows, enabling collaboration
104+
across institutions and improving data sharing. Two use cases for the GA4GH WES
105+
specification are:
106+
107+
- Scenario 1: A researcher wants to analyze a large dataset of genomic data
108+
using a specific analysis pipeline. With the WES specification, the
109+
researcher can easily define the inputs and parameters for the pipeline,
110+
select a computing platform that meets their requirements, and submit the job
111+
for execution. They can then monitor the progress of the job and receive
112+
notifications when the job is complete. This allows the researcher to focus
113+
on analyzing the results rather than managing the underlying infrastructure.
114+
115+
- Scenario 2: A clinical laboratory needs to process patient samples for
116+
genetic testing. The laboratory can use the WES specification to define the
117+
analysis pipeline and integrate it with its LIMS. This allows the laboratory
118+
to automate the processing of samples, reducing errors and turnaround time.
119+
120+
## Data Repository Service (DRS)
121+
122+
!!! warning "Under construction"
123+
More info coming soon...
124+
125+
## Tool Registry Service (TRS)
126+
3127
!!! warning "Under construction"
4128
More info coming soon...

includes/abbreviations.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
*[FOSS]: Free & Open Source Software
55
*[GA4GH]: The Global Alliance for Genomics and Health is a policy-framing and technical standards-setting organization, seeking to enable responsible genomic data sharing within a human rights framework.
66
*[GSoC]: Google Summer of Code
7+
*[LIMS]: Laboratory Information Management System
78
*[NBDC]: National Bioscience Database Center
89
*[TES]: GA4GH Task Execution Service API
910
*[TRS]: GA4GH Tool Registry Service API

includes/references.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
[conv-commits]: <https://www.conventionalcommits.org/en/v1.0.0-beta.2/#specification>
1111
[conv-commits-blog]: <https://nitayneeman.com/posts/understanding-semantic-commit-messages-using-git-and-angular/>
1212
[conv-commits-lint]: <https://github.com/conventional-changelog/commitlint>
13+
[cwl-tes]: <https://github.com/ohsu-comp-bio/cwl-tes>
1314
[elixir]: <https://elixir-europe.org/>
1415
[elixir-cloud-aai]: <https://elixir-cloud.dcc.sib.swiss/>
1516
[elixir-cloud-aai-contributors]: <https://elixir-cloud.dcc.sib.swiss/contributors>
@@ -21,9 +22,20 @@
2122
[elixir-cloud-aai-news]: <https://elixir-cloud.dcc.sib.swiss/news>
2223
[elixir-cloud-aai-twitter]: <https://twitter.com/ELIXIRcloud_aai>
2324
[elixir-cloud-aai-email]: <mailto:cloud-service@elixir-europe.org>
25+
[elixir-cloud-demo-cwl]: <https://github.com/elixir-cloud-aai/elixir-cloud-demos/tree/main/demos/2023-ecp-f2f>
26+
[elixir-cloud-demo-smk]: <https://github.com/elixir-cloud-aai/demo-tes-hybrid-cloud>
27+
[elixir-cloud-registry]: <https://elixir-cloud.dcc.sib.swiss/ga4gh/registry/v1/ui/>
28+
[elixir-cloud-services]: <https://github.com/elixir-cloud-aai/elixir-cloud-aai/blob/dev/resources/resources.md>
2429
[fair]: <https://www.go-fair.org/fair-principles/>
30+
[funnel]: <https://ohsu-comp-bio.github.io/funnel/>
2531
[ga4gh]: <https://ga4gh.org/>
32+
[ga4gh-cloud]: <https://ga4gh-cloud.github.io/>
2633
[ga4gh-dps]: <https://www.ga4gh.org/how-we-work/driver-projects/>
34+
[ga4gh-drs]: <https://github.com/ga4gh/data-repository-service-schemas>
35+
[ga4gh-service-registry]: <https://github.com/ga4gh-discovery/ga4gh-service-registry>
36+
[ga4gh-tes]: <https://github.com/ga4gh/task-execution-schemas>
37+
[ga4gh-trs]: <https://github.com/ga4gh/tool-registry-service-schemas>
38+
[ga4gh-wes]: <https://github.com/ga4gh/workflow-execution-schemas>
2739
[git]: <https://git-scm.com/>
2840
[git-branch]: <https://git-scm.com/book/en/v2/Git-Branching-Basic-Branching-and-Merging>
2941
[git-commit]: <https://git-scm.com/docs/git-commit>
@@ -60,6 +72,7 @@
6072
[linkedin-ayush]: <https://www.linkedin.com/in/ayush-kumar-514a17197/>
6173
[linkedin-lakshya]: <https://www.linkedin.com/in/lakshyaagarg/>
6274
[linkedin-suyash]: <https://www.linkedin.com/in/sgalpha01/>
75+
[nextflow]: <https://www.nextflow.io/>
6376
[osi]: <https://opensource.org/>
6477
[py]: <https://www.python.org/>
6578
[py-black]: <https://github.com/psf/black>
@@ -75,3 +88,6 @@
7588
[py-pytest]: <https://docs.pytest.org/en/latest/>
7689
[py-typing]: <https://docs.python.org/3/library/typing.html>
7790
[sem-ver]: <https://semver.org/>
91+
[snakemake]: <https://snakemake.readthedocs.io/en/stable/>
92+
[snakemake-docs]: <https://snakemake.readthedocs.io/en/stable/executing/cloud.html#executing-a-snakemake-workflow-via-ga4gh-tes>
93+
[tesk]: <https://github.com/elixir-cloud-aai/TESK>

0 commit comments

Comments
 (0)