You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+52-41Lines changed: 52 additions & 41 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,9 +2,9 @@
2
2
3
3
## Introduction
4
4
5
-
MARS is a data brokering initiative for submitting multi-omics life sciences studies to multiple specialized repositories.
6
-
It is setup to be modular and enables data producers and multiple data repositories to exchange information seamlessly using the same standardized ISA-JSON format. Unlike a centralized platform, MARS operates as a common framework, allowing for decentralized data submissions while ensuring consistent interpretation and validation of ISA-JSON containing metadata across various repositories.
7
-
The initiative ensures mutual understanding and accurate interpretation of the data, preserving the important links between multi-omics data generated from the same biological source.
5
+
MARS is a data brokering initiative designed to facilitate the submission of multi-omics life sciences studies to [multiple specialized repositories](#isa-json-support-by-repositories). Built as a modular system, MARS enables seamless data exchange between data producers and repositories using the standardized ISA-JSON format.
6
+
7
+
Unlike centralized platforms, MARS functions as a common framework for decentralized data submissions while ensuring consistent interpretation and validation of ISA-JSON metadata across various repositories. This approach preserves important links between multi-omics datasets derived from the same biological source, ensuring mutual understanding and accurate data interpretation.
8
8
9
9
## Stakeholders
10
10
@@ -13,7 +13,6 @@ MARS is comprised of multiple stakeholders: the end-user, the platform that gene
13
13
14
14
## Components
15
15
16
-
17
16

18
17
19
18
@@ -26,81 +25,89 @@ We use [ISA-JSON](https://isatools.readthedocs.io/en/latest/isamodel.html) to st
26
25
-**Interoperability**: Since ISA-JSON follows a standard format, it facilitates interoperability between different software tools and platforms that support the ISA standard.
27
26
-**Community Adoption**: Widely adopted within the life sciences research community for metadata standardization.
28
27
29
-
It is produced by the [ISA-JSON producing platforms](/stakeholders.md#isa-json-producing-platforms) and will be the metadata input of the Data broker platform, see below.
28
+
ISA-JSON is generated by [ISA-JSON producing platforms](/stakeholders.md#isa-json-producing-platforms) and serves as the metadata input for the Data Broker platform, as outlined below.
30
29
31
30
### Data broker platform
32
31
33
-
A platform operated by the [Data broker](/stakeholders.md#data-broker) should:
32
+
A platform operated by the [Data broker](/stakeholders.md#data-broker) should:
34
33
35
-
* Accept an ISA-JSON as input and submit it to the repositories without any loss of information.
36
-
* Extend the ISA-JSON with additional information provided by the target repositories. For example, the accessions assigned to the submitted objects.
37
-
* Process reporting errors
38
-
* Enable secure credential management and the possibility to set brokering accounts.
39
-
* Supports data transfer through various protocols (e.g. FTP). This would include the verification of the checksums associated to the data files.
40
-
* Allows the Data broker to set up a brokering account or the end-user a personal account.
41
-
* To ensure that the brokering account is not used beyond the purposes defined by the producer. In other words, not to modify or submit in the name of the producer without their consent.
34
+
- Accept ISA-JSON as input and submit it to target repositories without any loss of information.
35
+
- Extend the ISA-JSON with additional information from the repositories, such as accession numbers assigned to submitted objects.
36
+
- Handle error reporting efficiently.
37
+
- Maintain an active submission process throughout its duration (up to multiple days), including waiting for repository-side validation steps to complete.
38
+
- Enable secure credential management.
39
+
- Support data transfer via various protocols (e.g., FTP), ensuring checksum verification for data integrity.
40
+
- Allow the Data broker to set up a brokering account or enable end-users to create personal accounts.
42
41
43
-
> **To be discussed**:
44
-
> Handling brokering accounts: who creates it? Same for all repositories? Who handles requests to broker data? Can it be done automatically? Are all namespaces for submissions would be shared? Check: https://ena-docs.readthedocs.io/en/latest/faq/data_brokering.html
42
+
Examples of Data broker platforms include ARC, Galaxy, and others.
45
43
46
44
47
-
Examples of Data broker platforms are ARC, Galaxy, ...
48
-
49
45
### MARS-CLI
50
46
51
-
This command line tool (CLI) will be used by the Data broker platform and will perform the actual submission of the ISA-JSON to the repositories. Based on receipts repositories give back as response, the ISA-JSON will be updated with accession numbers. The application is build as a Python library which can be integrated in a web application, ARC, Galaxy and others. Source code and documentation can be found in the [mars-cli folder](/mars-cli/) in this repo.
47
+
MARS-CLI is a command-line tool (CLI) used by the Data Broker platform to handle the submission of ISA-JSON metadata to multiple repositories. It automates the submission process, updates ISA-JSON with accession numbers based on repository responses, and ensures smooth data integration.
52
48
53
-
The main steps of MARS-CLI are:
49
+
Built as a Python library, MARS-CLI can be integrated into web applications, ARC, Galaxy, and other platforms. The source code and documentation are available in the [mars-cli folder](/mars-cli/) in this repository.
54
50
55
-
1.**Ingesting and validating the ISA-JSON**: Compared to the vanilla ISA specification, the MARS-CLI has certain fields required (for example `target repository` as comment) in order to function properly. Upon ISA-JSON ingestion the information gets loaded in memory and validated at the same time using Pydantic.
51
+
#### Main Steps of MARS-CLI
56
52
57
-
2.**Identifying the target repositories**: The order of submission can be depended on the target repositories specified in the ISA-JSON.
53
+
1.**Ingesting and Validating the ISA-JSON**
54
+
MARS-CLI requires certain mandatory fields beyond the standard ISA specification (e.g., `target_repository` as a comment). Upon ingestion, the ISA-JSON is loaded into memory and validated using Pydantic to ensure it meets these requirements.
58
55
59
-
3.**Registering samples in BioSamples**: Submitting an ISA-JSON to a newly developed API at BioSamples. The BioSamples accession will be reused by the other repositories and thus needs to be done first.
60
-
After a successful submission, BioSamples sends back a receipt containing BioSamples accession numbers for `Source` and `Sample` as `Source characteristics` and `Sample characteristics`, respectively.
56
+
2.**Identifying the Target Repositories**
57
+
The order of submission depends on the repositories specified in the ISA-JSON. MARS-CLI determines the correct sequence for submitting metadata and data.
61
58
62
-
> The source code for the ISA-JSON API for BioSamples can be found in the [repository-services repo](/repository-services/isajson-biosamples/) and can be used for testing
59
+
3.**Registering Samples in BioSamples**
60
+
MARS-CLI first submits the ISA-JSON to BioSamples via a newly developed API. BioSamples accessions are crucial since other repositories reuse them.
63
61
64
-
4.**Filtering the ISA-JSON**: The ISA-JSON (updated with BioSamples IDs) has to be filtered for every target repository so it only contains information relevant for that repo. This will be facilitated by the `target repository` attribute present in the ISA-JSON assays.
62
+
- After submission, BioSamples returns a receipt containing accessions for `Source` and `Sample`, mapped to `Source characteristics` and `Sample characteristics`, respectively.
63
+
- Source Code: The BioSamples ISA-JSON API can be found in the [repository-services repo](/repository-services/isajson-biosamples/) and is available for testing.
65
64
66
-
5.**Submitting data to target repositories**: Since some repositories have the requirement that the actual data is already present in their upload space, the MARS-CLI could optionally take care of the data submission. This would guarantee the presence of the the data upon the metadata (ISA-JSON) submission and a checksum check.
65
+
4.**Filtering the ISA-JSON**
66
+
Once updated with BioSamples accessions, the ISA-JSON is filtered for each target repository. This ensures that only relevant metadata is submitted to each repository. The filtering is based on the `target repository` attribute present in the ISA-JSON assays.
67
67
68
-
6.**Registering ISA-JSON at the target repositories**: Sending the filtered ISA-JSON to the endpoints of the repositories who accept ISA-JSON, for example ENA.
68
+
5.**Submitting Data to Target Repositories**
69
+
MARS-CLI uploads data using FTP. Some repositories require that data files are present in their upload space before metadata submission. This step ensures that data availability and checksum validation are completed before sending metadata.
69
70
70
-
> The source code for the ISA-JSON API for ENA can be found in the [repository-services repo](/repository-services/isajson-ena/) and can be used for testing
71
+
6.**Registering ISA-JSON at the Target Repositories**
72
+
The filtered ISA-JSON is submitted to repositories that accept ISA-JSON metadata, such as ENA. The MARS project helps out with the adaptation of ISA-JSON by the repositories using the so called adapters.
71
73
72
-
7.**Processing the receipts and errors from the repositories**: After a successful submission, each repository sends back a receipt in a standard format defined for MARS (see [repository-api info](/repository-services/repository-api.md)). The receipt contains the path of the objects in the ISA-JSON for which an accession number has been generated, and the related accession number.
74
+
- Source Code: The ISA-JSON API for ENA is available in the [repository-services repo](/repository-services/isajson-ena/) and can be used for testing.
73
75
74
-
8.**Updating the BioSamples External References**: Data broker uses the BioSamples accession numbers to download the submitted BioSamples JSON and extend the `External References` schema by adding the accession numbers provided by the other target archives.
76
+
7.**Processing Repository Receipts and Errors**
77
+
After submission, each repository returns a receipt in a standardized format defined for MARS (see [repository-api info](/repository-services/repository-api.md)).
75
78
76
-
9.**Dumping back an updated ISA-JSON with repositories' information**: Based on the information in the receipts, the ISA-JSON is populated with accession numbers linked to the submitted metadata objects and can be given back as output.
79
+
- The receipt includes the paths of objects within the ISA-JSON and their assigned accession numbers.
80
+
- Errors encountered during submission are processed and reported accordingly.
77
81
82
+
8.**Updating BioSamples External References**
83
+
The Data Broker retrieves the BioSamples JSONs of the submitted samples using its accession numbers and updates the `External References` schema by adding the accession numbers assigned by other target repositories.
78
84
79
-
#### Credential management
85
+
9.**Generating an Updated ISA-JSON with Repository Information**
86
+
Based on the information in repository receipts, the ISA-JSON is updated with accession numbers linked to submitted metadata objects. This final, enriched ISA-JSON can then be provided as output.
80
87
81
-
MARS-CLI is not responsible for storing and managing credentials, used for submission to the target repositories. Therefor, credentials should be managed by the [Data broker platform](#data-broker-platform).
88
+
#### Credential management
82
89
90
+
MARS-CLI comes with the functionality to interact with your device's keychain backend in order to fetch the necessary credentials. This allows the user to set credentials in a safe way.
83
91
84
92
#### Data submission
85
93
86
-
MARS-CLI is not to be used as a platform to host data and will not store the data after submission to the target repository. This should be handled by the [Data broker platform](#data-broker-platform). The ISA-JSON provided to the application will be updated and stored in the BioSamples repository as an External Reference, but is otherwise considered as ephemeral.
94
+
MARS-CLI is not a platform to host data and will not store the data after submission to the target repository. This should be handled by the [Data broker platform](#data-broker-platform) where MARS-CLI is installed. The ISA-JSON provided to the application will be updated and stored in the BioSamples repository as an External Reference, but is otherwise considered as ephemeral.
87
95
88
-
=> Data submission could be added to MARS-CLI?
89
96
90
97
### ISA-JSON support by repositories
91
98
92
-
ISA-JSON API services are being developed and deployed by the repositories that are part of the MARS initiative. This includes programmatic submission, the ingestion of ISA-JSON in order to register the metadata objects and the creation of a receipt according to the MARS [repository-api](/repository-services/repository-api.md) standard.
99
+
ISA-JSON API services, also known as adapters, are being developed and deployed by the repositories that are part of the MARS initiative. This includes programmatic submission, the ingestion of ISA-JSON in order to register the metadata objects and the creation of a receipt according to the MARS [repository-api](/repository-services/repository-api.md) standard.
93
100
94
101
Track the status of each repository here:
95
102
96
-
| Repository | Programmatic submission | Development status | Deployed?| Source code |
103
+
| Repository | Programmatic submission | Development status | Deployed | Source code |
97
104
|---|---|---|---|---|
98
-
|[BioSamples](https://www.ebi.ac.uk/biosamples/)| yes |Ready to be tested| no |[GitHub](repository-services/isajson-biosamples)|
99
-
|[ENA](https://www.ebi.ac.uk/ena/browser/)| yes |Ready to be tested| no |[GitHub](repository-services/isajson-ena)|
100
-
|[MetaboLights](https://www.ebi.ac.uk/metabolights/)|NA|Not started| no ||
105
+
|[BioSamples](https://www.ebi.ac.uk/biosamples/)| yes |PoC being improved| no |[GitHub](repository-services/isajson-biosamples)|
106
+
|[ENA](https://www.ebi.ac.uk/ena/browser/)| yes |PoC being improved| no |[GitHub](repository-services/isajson-ena)|
107
+
|[MetaboLights](https://www.ebi.ac.uk/metabolights/)|yes|Proof of concept| no ||
101
108
|[BioStudies/ArrayExpress](https://www.ebi.ac.uk/biostudies/arrayexpress)| yes, in dev | Not started | no ||
102
109
|[e!DAL-PGP](https://edal-pgp.ipk-gatersleben.de/)| NA | Not started | no ||
103
-
| Your repository here? Join MARS! |
110
+
| Your repository here? Join MARS! |||||
104
111
105
112
## File structure in this repo
106
113
@@ -126,3 +133,7 @@ Track the status of each repository here:
126
133
-**repository-api.md**: Describing the receipt standard for repository APIs to follow.
127
134
-**test-data**: Test data to be used in a submission.
128
135
-**README.md**: This file
136
+
137
+
## Acknowledgements
138
+
139
+
This project was initiated during the ELIXIR Europe BioHackathon 2022 and has since received continued support through subsequent ELIXIR Hackathons and the ELIXIR Data Platform WP2.
0 commit comments