Skip to content

Commit 28cfbac

Browse files
committed
fix: add explicit ids
1 parent 0a49972 commit 28cfbac

1 file changed

Lines changed: 15 additions & 15 deletions

File tree

docs/00_intro/10_fair.mdx

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -28,13 +28,13 @@ In the following, we answer the questions: What makes data FAIR? What do researc
2828

2929
Researchers — and the computers working on their behalf — must be able to find datasets to be able to reuse them. Therefore, the first guideline of the FAIR Data Principles outlines methods to ensure a dataset’s discovery.
3030

31-
### F1. (meta)data are assigned a globally unique and persistent identifier
31+
### F1. (meta)data are assigned a globally unique and persistent identifier {#f1}
3232

3333
A globally unique and [persistent identifier (PID)](/docs/pid) helps both machines and humans find the data in the first place. These PIDs are essential for research as they guarantee the availability of the associated resource, in this case a dataset. The registry services that make these identifiers available work to maintain the link to the resource, thus avoiding dead links. This ensures the resource remains findable and may be referenced simply by the use of its PID.
3434

3535
A common example of a citable PID is the Digital Object Identifier, or [DOI](https://doi.org/10.1000/182). As with many journals, scientific data repositories often assign a DOI automatically. The Registry of Research Data Repositories, [re3data](https://www.re3data.org/), indicates whether a given repository assigns an identifier, along with the PID type. For example, both the [The Cambridge Structural Database (CSD)](https://www.ccdc.cam.ac.uk/solutions/csd-system/components/csd/) and the [Chemotion Repository](https://www.chemotion-repository.net/) assign DOIs to each dataset deposited. Researchers must be aware of this option when searching for a suitable repository, while repositories should offer this service.
3636

37-
### F2. data are described with rich metadata (defined by R1 below)
37+
### F2. data are described with rich metadata (defined by R1 below) {#f2}
3838

3939
Data need to be sufficiently described in order to make them both findable and reusable. Hence, the specific focus here lies on making the (meta)data findable by using rich discovery [metadata](/docs/metadata) in a standardized format and allowing computers and humans to quickly understand the dataset’s contents. This is an essential component in the plurality of metadata described by [R1](#r1-metadata-are-richly-described-with-a-plurality-of-accurate-and-relevant-attributes) below. This information may include, but is not limited to:
4040

@@ -46,34 +46,34 @@ Data need to be sufficiently described in order to make them both findable and r
4646

4747
Repositories should provide researchers with a fillable [application profile](https://en.wikipedia.org/wiki/Application_profile) that allows researchers to give extensive and precise information on their deposited datasets. For example, the Chemotion Repository uses, among others, the [Datacite Metadata Schema](http://doi.org/10.5438/0012) to build its application profile, a schema specifically created for the publication and citation of research data. [RADAR](https://radar.products.fiz-karlsruhe.de/en), including the variant [RADAR4Chem](https://www.nfdi4chem.de/index.php/2650-2/), has also built [its metadata schema](https://radar.products.fiz-karlsruhe.de/en/radarfeatures/radar-metadatenschema) on Datacite. These include an assortment of mandatory, recommended, and optional metadata properties, allowing for a rich description of the deposited dataset. For those publishing data, always keep in mind: the more information provided, the better.
4848

49-
### F3. metadata clearly and explicitly include the identifier of the data it describes
49+
### F3. metadata clearly and explicitly include the identifier of the data it describes {#f3}
5050

5151
While [F1](#f1-metadata-are-assigned-a-globally-unique-and-persistent-identifier) stipulates the assignment of an identifier, F3 underlines the importance of including this identifier in the metadata itself. The metadata and the dataset it describes are typically separate files. Including the identifier in the metadata directly links the information to the associated dataset.
5252

5353
Furthermore, the dataset may not be published alongside the metadata. For example, in the case of unpublished archived datasets, the PID can lead to a method (e.g. a landing page) to contact those responsible for the data instead of to the dataset itself.
5454
Researchers must be aware of this importance, while repositories must not only assign a PID as described in [F2](#f2-data-are-described-with-rich-metadata-defined-by-r1-below) above, but should also ensure that this PID is a required property of the metadata.
5555

56-
### F4. (meta)data are registered or indexed in a searchable resource
56+
### F4. (meta)data are registered or indexed in a searchable resource {#f4}
5757

5858
Metadata are used to set up indices, enabling machines to efficiently search for and find datasets. For this process to work successfully, metadata must be complete as outlined [above](#f2-data-are-described-with-rich-metadata-defined-by-r1-below). Repositories should ensure the metadata entered for a deposited dataset is available in a machine-readable format to facilitate the assignment of indices.
5959

6060
## Accessible
6161

6262
Accessible means that humans and machines receive instructions on how to obtain the data. It should be noted that FAIR does not equate to open, as further explained in [A1.2](#a12-the-protocol-allows-for-an-authentication-and-authorization-procedure-where-necessary).
6363

64-
### A1. (meta)data are retrievable by their identifier using a standardized communications protocol
64+
### A1. (meta)data are retrievable by their identifier using a standardized communications protocol {#a1}
6565

6666
To guarantee access to datasets, persistent identifiers, such as DOIs, are suggested, which are resolved by standard methods. Common protocols include http(s) or (s)ftp.
6767

68-
#### A1.1 the protocol is open, free, and universally implementable
68+
#### A1.1 the protocol is open, free, and universally implementable {#a1_1}
6969

7070
Repositories should only use protocols that allow any computer to access at least the metadata. Not only does this refer to the use of standard communication protocols, as stated in [A1](#a1-metadata-are-retrievable-by-their-identifier-using-a-standardized-communications-protocol), these protocols must also be freely available and open-sourced. Therefore, proprietary or non-standard protocols should be avoided.
7171

72-
#### A1.2 the protocol allows for an authentication and authorization procedure, where necessary
72+
#### A1.2 the protocol allows for an authentication and authorization procedure, where necessary {#a1_2}
7373

7474
Where necessary, machine-readable protocols that let the user know that action needs to be taken (such as a login) to access data must be in place. FAIR data and open data are not synonymous: FAIR data requires that it must be clearly stated how the data can be accessed, as opposed to granting anyone and everyone full access. In manuscripts of scientific articles, this information should be included in a [data availability statement](/docs/data_availability_statement). This can be especially important for sensitive data, where, for example, personal data and/or medical information may be disclosed. Hence, repositories should also provide a way for users (and their computers) to identify themselves, enabling access permission to be granted.
7575

76-
### A2. metadata are accessible, even when the data are no longer available
76+
### A2. metadata are accessible, even when the data are no longer available {#a2}
7777

7878
The metadata that describes a dataset should be stored in a separate file so it is available, even if the datasets themselves can no longer be accessed. Problems with dataset availability are usually due to 1) the cost of maintaining and storing full datasets and 2) file format deprecation as technologies evolve. Maintaining metadata files is cheaper and simpler and ensures that, at a minimum, details such as contact information remains available. These files should thus be archived forever.
7979

@@ -83,26 +83,26 @@ A repository should clearly state a contingency plan for metadata storage should
8383

8484
Data need to be integrated with and/or compared to other datasets, while computers must be able to interpret and exchange the information. Ideally, they are compatible with standard applications and can thus be integrated into (automated) processing and analysis workflows. Interoperability often functions as a precursor to [reusability](#reusable), as it ensures the compatibility across systems.
8585

86-
### I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation
86+
### I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation {#i1}
8787

8888
Machines need to be able to understand how to exchange and interpret information. Similar to humans, a uniform and standard language aids in this understanding. In chemistry, a great typical example of such an information exchange standard is the [crystallographic information (CIF)](https://doi.org/10.1107/97809553602060000107). This standard also adheres to the aspects described in [I2](#i2-metadata-use-vocabularies-that-follow-fair-principles) and [R1.3](#r13-metadata-meet-domain-relevant-community-standards) below. Simply put, [standard file formats](/docs/format_standards) for a given analytical method ensure the data and the associated metadata, which typically include measurement details, for example, follows a prescribed format. This ensures both humans and machines receive the information required to interpret the data.
8989

9090
Especially when looking at metadata, effective and efficient machine readability greatly depends on being able to reduce ambiguity. Metadata provides context to datasets. However, machines need to be able to interpret this context. Therefore, the structured schemas chosen by the repositories should include universally applied [ontologies](https://terminology.nfdi4chem.de/ts/ontologies) and controlled vocabularies to define relationships and avoid ambiguity. For example, chemistry-specific repositories should be designed to include [ontologies](/docs/ontology) such as the [Chemical Methods Ontology](https://terminology.nfdi4chem.de/ts/ontologies/chmo) (CHMO) or the [Chemical Information Ontology](https://terminology.nfdi4chem.de/ts/ontologies/cheminf) (CHEMINF) to accurately describe the (meta)data provided. Such ontologies should be based on widely-applied data models, for instance, the
9191
[Resource Description Framework (RDF)](https://www.w3.org/RDF/).
9292

93-
### I2. (meta)data use vocabularies that follow FAIR principles
93+
### I2. (meta)data use vocabularies that follow FAIR principles {#i2}
9494

9595
The applied vocabularies or ontologies should be well-documented and resolvable using a PID. For instance, CHMO mentioned [above](#i1-metadata-use-a-formal-accessible-shared-and-broadly-applicable-language-for-knowledge-representation) uses a [persistent URL (PURL)](https://en.wikipedia.org/wiki/Persistent_uniform_resource_locator), resolvable using a standard web browser through `http`, while the [documentation](https://github.com/rsc-ontologies/rsc-cmo) is publicly available on Github.
9696

97-
### I3. (meta)data include qualified references to other (meta)data
97+
### I3. (meta)data include qualified references to other (meta)data {#i3}
9898

9999
Related datasets should be linked in a reliable manner, preferably via their PIDs. This includes any previous versions, datasets required to fully use and comprehend the current dataset, or datasets that the dataset builds upon. This relationship should also be described in a meaningful manner. For example, if dataset X is a previous version of dataset Y, it would be described as such rather than simply being described as a related or an associated dataset. Repositories should include a method of referring to other datasets in their metadata form.
100100

101101
## Reusable
102102

103103
Many of the previous points lead to one key aspect of data sharing: data reusability. Datasets must be described in a manner that allows the user to easily determine how and under which conditions the data can be reused.
104104

105-
### R1. (meta)data are richly described with a plurality of accurate and relevant attributes
105+
### R1. (meta)data are richly described with a plurality of accurate and relevant attributes {#r1}
106106

107107
Related to [F2](#f2-data-are-described-with-rich-metadata-defined-by-r1-below) above, the focus here lies on whether the data, once found, is useable to the person or computer searching. It also stresses giving the data as many attributes as possible. Researchers should not assume the person—or that person’s computer—looking to re(use) their data is completely familiar with the discipline. Examples of information to assign here include (non-exhaustive list):
108108

@@ -122,15 +122,15 @@ An important piece of information for chemical data are [machine-readable chemic
122122

123123
Repositories should provide data publishers with the opportunity to include a plurality of information in their metadata. This includes giving a wide range of optional and free-fill fields for data publishers to complete.
124124

125-
#### R1.1. (meta)data are released with a clear and accessible data usage license
125+
#### R1.1. (meta)data are released with a clear and accessible data usage license {#r1_1}
126126

127127
The metadata should include human and machine-readable use conditions, such as a [licence](/docs/licences). [Creative Commons](https://creativecommons.org/) licences are commonly used for scientific data. [re3data](https://www.re3data.org/) lists whether a repository allows researchers to directly select a licence or terms of use agreement when depositing data. At a minimum, repositories should allow researchers to add a licence file.
128128

129-
#### R1.2. (meta)data are associated with detailed provenance
129+
#### R1.2. (meta)data are associated with detailed provenance {#r1_2}
130130

131131
In simple terms: metadata include any relevant history. If the dataset is related to other datasets or based on another researcher’s data, these should be linked via their PID as described in I3. This includes citing or acknowledging others for their work, which also takes their licensing or use agreements into consideration (see [R1.1](#r11-metadata-are-released-with-a-clear-and-accessible-data-usage-license)). Furthermore, metadata should contain machine-readable information on how the data was generated or processed.
132132

133-
#### R1.3. (meta)data meet domain-relevant community standards
133+
#### R1.3. (meta)data meet domain-relevant community standards {#r1_3}
134134

135135
As research data management and, as such, [data publishing](/docs/data_publishing) becomes more and more prevalent across research areas, [best practices](/docs/best_practice) in the individual communities will arise. This should encompass metadata templates for proper documentation of datasets, how the data should be [organized](/docs/data_organisation), which vocabularies or [ontologies](/docs/ontology) to use, and [file formats](/docs/format_standards). NFDI4Chem is working to establish [metadata and data standards](https://www.nfdi4chem.de/index.php/task-areas/) for the various communities in chemistry.
136136

0 commit comments

Comments
 (0)