Skip to content

Commit cd94259

Browse files
author
Sven Twardziok
committed
wip notification
1 parent af12a55 commit cd94259

1 file changed

Lines changed: 6 additions & 150 deletions

File tree

docs/sandbox/index.md

Lines changed: 6 additions & 150 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,4 @@
1-
# User guide
2-
3-
## Introduction
1+
# Sandbox
42

53
Welcome to the user documentation for the ELIXIR-on-Cloud ecosystem. With
64
this powerful set of services, you'll be able to easily access cloud resources
@@ -10,157 +8,15 @@ worrying about infrastructure limitations or having to manage complex server
108
environments. The GA4GH Cloud APIs give you access to powerful tools and
119
resources that allow you to focus on your research goals, not IT.
1210

13-
The GA4GH (Global Alliance for Genomics and Health) cloud [APIs][ga4gh-cloud]
14-
are a set of standard APIs that provide a common interface for accessing
15-
genomic data and tools across different cloud providers. These APIs are
16-
essential for enabling genomic data sharing and collaboration, and they have
17-
been adopted by major cloud providers such as Google Cloud Platform, Microsoft
18-
Azure, and Amazon Web Services. In this documentation, we'll cover four main
19-
GA4GH APIs that you'll be using: the Workflow Execution Service
20-
([WES][ga4gh-wes]), the Task Execution Service ([TES][ga4gh-tes]), the Data
21-
Repository Service ([DRS][ga4gh-drs]), and the Tool Registry Service
22-
([TRS][ga4gh-trs]). The WES API allows you to define and execute workflows,
23-
while the TES API allows you to execute individual tasks within those
24-
workflows. The DRS API provides a way to access and download genomic data, and
25-
the TRS API enables the discovery of genomic analysis tools.
26-
27-
Whether you are a bioinformatician or a data scientist, this documentation will
28-
provide you with all the information you need to start using ELIXIR's GA4GH
29-
cloud services ecosystem and harness the power of cloud computing for your
30-
genomic data analysis needs. Let's get started!
11+
!!! warning "Under construction"
12+
More info coming soon...
13+
Idea: introduce ELIXIR-on-Cloud Sandbox cloud - TES instances, dashboard, proTES entrypoint, storage, etc.
3114

3215
## ELIXIR-on-Cloud deployments
3316

34-
The ELIXIR-on-Cloud group manages different services and appliocations as
17+
The ELIXIR-on-Cloud group manages different services and applications as
3518
part of the ELIXIR cloud framework. Currently, these services are temporarily
36-
listed in a dedicated [services list applications][elixir-cloud-services]. In
19+
listed in a dedicated [services list applications][elixir-cloud-services]. In
3720
the mid-term, all services instances will be registered in the [ELIXIR Cloud
3821
Registry][elixir-cloud-registry], an implementation of the [GA4GH Service
3922
Registry API][ga4gh-service-registry].
40-
41-
## Task Execution Service (TES)
42-
43-
The GA4GH [TES][ga4gh-tes] specification is a standard interface that enables
44-
interoperability between workflow management systems and execution engines. The
45-
TES specification provides a uniform way to submit and monitor tasks to any
46-
execution engine that implements the specification, allowing users to easily
47-
switch between workflow management systems or execution engines without
48-
rewriting their workflows. Typical use cases are
49-
50-
- Scenario 1: A researcher wants to run a workflow locally. The workflow
51-
contains some resource-intensive steps, such as requirements for GPUs or many
52-
cores. Using TES as a backend, the researcher can execute the workflow
53-
locally and also send the resource-intensive tasks to cloud servers for
54-
execution.
55-
- Scenario 2: A researcher wants to run a workflow that involves processing
56-
data that is stored in cloud locations. Using TES would allow individual
57-
tasks to be sent to the compute locations associated with each storage
58-
location. This may be relevant if the data provider does not allow files to
59-
be downloaded to a central location or if it is not technically feasible.
60-
61-
The TES specification defines a HTTP API for submitting and monitoring tasks
62-
that includes endpoints for creating, querying, updating, and canceling tasks.
63-
Tasks are defined as JSON objects that include input and output files, a
64-
command to execute, and any environment variables or resources required by the
65-
task. The TES specification also includes mechanisms for handling task
66-
dependencies and retrying failed tasks. Popular TES implementations are
67-
[Funnel][funnel] and [TESK][tesk].
68-
69-
Several popular workflow management systems, including [cwl-tes][cwl-tes],
70-
[Snakemake][snakemake] and [Nextflow][nextflow], have implemented the TES
71-
specification, allowing users to easily run their workflows on any execution
72-
engine that supports TES.
73-
74-
### Snakemake
75-
76-
Snakemake supports TES v1.0 since version 5.28.0, as described in the
77-
[Snakemake documentation][snakemake-docs]. Snakemake executes individual tasks
78-
as separate workflows that execute only one or a few rules. When using TES, it
79-
is recommended to use additional remote storage to store input and output
80-
files. By default, Snakemake TES tasks are executed using the official
81-
Snakemake container image in the same version as the original Snakemake call.
82-
To use specific tools, conda environments should be appended to the rules. A
83-
demo workflow is available
84-
[here][elixir-cloud-demo-smk].
85-
86-
### CWL-tes
87-
88-
A demo workflow is available [here][elixir-cloud-demo-cwl].
89-
90-
### Nextflow
91-
92-
You can find an article about NextFlow with GA4GH TES [here](https://techcommunity.microsoft.com/blog/healthcareandlifesciencesblog/introducing-nextflow-with-ga4gh-tes-a-new-era-of-scalable-data-processing-on-azu/4253160)
93-
94-
To use TES in your Nextflow config, use the plugin `nf-ga4gh`:
95-
96-
```
97-
plugins {
98-
id 'nf-ga4gh'
99-
}
100-
```
101-
102-
## Workflow Execution Service (WES)
103-
The GA4GH [WES][ga4gh-wes] is a standard specification protocol for executing
104-
and monitoring bioinformatics workflows. It allows researchers to easily
105-
execute and manage complex analysis pipelines across multiple computing
106-
platforms and institutions. The WES specification provides a unified API for
107-
describing workflow inputs and outputs, monitoring job status and progress, and
108-
managing data transfers. With this specification, users can build scalable,
109-
reproducible, and interoperable genomics workflows, enabling collaboration
110-
across institutions and improving data sharing. Two use cases for the GA4GH WES
111-
specification are:
112-
113-
- Scenario 1: A researcher wants to analyze a large dataset of genomic data
114-
using a specific analysis pipeline. With the WES specification, the
115-
researcher can easily define the inputs and parameters for the pipeline,
116-
select a computing platform that meets their requirements, and submit the job
117-
for execution. They can then monitor the progress of the job and receive
118-
notifications when the job is complete. This allows the researcher to focus
119-
on analyzing the results rather than managing the underlying infrastructure.
120-
121-
- Scenario 2: A clinical laboratory needs to process patient samples for
122-
genetic testing. The laboratory can use the WES specification to define the
123-
analysis pipeline and integrate it with its LIMS. This allows the laboratory
124-
to automate the processing of samples, reducing errors and turnaround time.
125-
126-
## Data Repository Service (DRS)
127-
128-
The GA4GH [DRS][ga4gh-drs] API provides a standard set of data retrieval methods
129-
to access genomic and related health data across different repositories.
130-
It allows researchers to simplify and standardize data retrieval in cloud-based
131-
environements. Some key features like Standardized data access that offers a consistent
132-
API for retrieving datasets. Cloud-agnostic means that it works accross different
133-
cloud infrastructures. Two use cases for the GA4GH DRS:
134-
135-
- Scenario 1: A researcher wants to run an analysis pipeline on a dataset without
136-
worrying about where the data physically resides. The researcher uses a DRS ID
137-
to request the dataset. DRS resolves the ID to the actual storage location and
138-
provides signed URLs or access tokens and the pipeline retrievess the data
139-
seamlessly, regardless of the underlying cloud or storage system.
140-
141-
- Scenario 2: A pharmaceutical company is collaborating with hospitals to analyze
142-
patient genomic data. Due to privacy regulations, raw data cannot be moved outside
143-
the hospital’s secure environment. The hospital can expose their datasets via DRS
144-
endpointsand the pharmaceutical company's workflow engine queries DRS to get metadata.
145-
Finally, the analysis is performed without violating data residency rules.
146-
147-
## Tool Registry Service (TRS)
148-
149-
The GA4GH [TRS][ga4gh-trs] API provides a standard mechanism to list, search and
150-
register tools and worflows across different platforms and cloud environments.
151-
It supports workflows written in CWL, WDL, Nextflow, Galaxy, Snakemake.
152-
Here are examples of two use cases:
153-
154-
- Scenario 1: A bioinformatics researcher develops a workflow for variant calling
155-
using WDL and Docker containers. They want to share it with collaborators who use
156-
different platform. TRS can help, the researcher registers the workflow in a
157-
TRS-compliant registry like Dockstore. The collaborators can discover the workflow
158-
via TRS API and run it on their platform.
159-
TRS will ensure that metadata, versioning, and container are standardized and
160-
accessible
161-
162-
- Scenario 2: A hospital’s genomics lab uses an automated pipeline to analyze patient
163-
exome data for rare disease diagnosis. The pipeline queries a TRS registry to find
164-
the latest version of tools (like VEP or GATK), retrieves the workflow descriptor
165-
and container images. Finally, the pipeline executes the tools in a secure,
166-
compliant environment.

0 commit comments

Comments
 (0)