In 2016 FAIR Guiding Principles for scientific data management and stewardship’ were published in Scientific Data. The authors intended to provide guidelines to improve the findability, accessibility, interoperability, and reuse of digital assets.
The principles emphasise machine-actionability (i.e., the capacity of computational systems to find, access, interoperate, and reuse data with none or minimal human intervention) because humans increasingly rely on computational support to deal with data as a result of the increase in volume, complexity, and creation speed of data.
The principles refer to three types of entities:
- data (or any digital object),
- metadata (information about that digital object),
- Retrieve non-FAIR data: gain access to the data to be FAIRified.
- Analyse the retrieved data:
- inspect the content of the data:
- Which concepts are represented?
- What is the structure of the data?
- What are the relations between the data elements? - Different data distributions require different methods for identification and analysis.
- Define the semantic model: define a ‘semantic model’ for the dataset, which describes the meaning of entities and relations in the dataset accurately, unambiguously, and in a computer-actionable way.
- Make data linkable: The non-FAIR data can be transformed into linkable data by applying the semantic model defined in step 3. Currently, this is done using Semantic Web and Linked Data technologies.
- Assign license: Although license information is part of the metadata, we have incorporated the license assignment as a separate step in the FAIRification process to highlight its importance. The absence of an explicit license may prevent others to reuse data, even if the data is intended to be open access.
- Define metadata for the dataset: As explained by many of the FAIR principles, proper and rich metadata support all aspects of FAIR. (Read the GO FAIR recommendation for metadata.)
- Deploy FAIR data resource: deploy or publish the FAIRified data, together with relevant metadata and a license, so that the metadata can be indexed by search engines and the data can be accessed, even if authentication and authorisation are required.
The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. Machine-readable metadata are essential for automatic discovery of datasets and services, so this is an essential component of the FAIRification process.
F1. (Meta)data are assigned a globally unique and persistent identifier
- Identifiers.org provides resolvable identifiers in the form of URIs and CURIEs: http://identifiers.org
- Universally unique identifier: https://en.wikipedia.org/wiki/Universally_unique_identifier
- Persistent URLs: http://www.purlz.org
- Digital Object Identifier: http://www.doi.org
- Archival Resource Key: https://escholarship.org/uc/item/9p9863nc
- Research Resource Identifiers: https://scicrunch.org/resources
- Identifiers for funding organisations (see F3 & R1): https://www.crossref.org/services/funder-registry/
- Identifiers for the world’s research organisations (see F3 & R1): https://www.grid.ac
F2. Data are described with rich metadata (defined by R1 below)
F3. Metadata clearly and explicitly include the identifier of the data they describe
- The FAIRifier tool guarantees F3
F4. (Meta)data are registered or indexed in a searchable resource
Once the user finds the required data, she/he needs to know how can they be accessed, possibly including authentication and authorisation.
A1. (Meta)data are retrievable by their identifier using a standardised communications protocol
A1.1 The protocol is open, free, and universally implementable
A1.2 The protocol allows for an authentication and authorisation procedure, where necessary
A2. Metadata are accessible, even when the data are no longer available
- ISO19115 and 19115-2 CodeList, ISO 19115-3, Guide to implementing ISO 19115:2003(E)
- Nederlands metadata profiel op ISO 19119 voor services versie 2.1.0
The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing.
I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
- The RDF extensible knowledge representation model is a way to describe and structure datasets. You can refer to the Dublin - Core Schema as an example.
- JSON LD
- See example data models for Predicted gene-disease associations from text mining and Tissue gene expression.
- See data models from EBI in the ‘documentation’ links on this page http://www.ebi.ac.uk/rdf/
I2. (Meta)data use vocabularies that follow FAIR principles
I3. (Meta)data include qualified references to other (meta)data
The ultimate goal of FAIR is to optimise the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings.
R1. Meta(data) are richly described with a plurality of accurate and relevant attributes
- Describe the scope of your data: for what purpose was it generated/collected?
- Mention any particularities or limitations about the data that other users should be aware of.
- Specify the date of generation/collection of the data, the lab conditions, who prepared the data, the parameter settings, the name and version of the software used.
- Is it raw or processed data?
- Ensure that all variable names are explained or self-explanatory (i.e., defined in the research field’s controlled vocabulary).
- Clearly specify and document the version of the archived and/or reused data.
R1.1. (Meta)data are released with a clear and accessible data usage license
R1.2. (Meta)data are associated with detailed provenance
R1.3. (Meta)data meet domain-relevant community standards
It is easier to reuse data sets if they are similar: same type of data, data organised in a standardised way, well-established and sustainable file formats, documentation (metadata) following a common template and using common vocabulary.
- DataCite Metadata Schema (for general purpose, not domain-specific)
- Dublin Core Metadata Initiative (for general purpose, not domain-specific)
- geographic information and services
- climate and forecast
- crystallographic information
- neutron, x-ray, and muon experiment data
- social, behavioral, and economic sciences
- statistical data
A GO FAIR Implementation Network (IN) is a consortium committed to defining and creating specific materials and tools as elements of the Internet of FAIR Data and Services (IFDS).
The INs commit to implementing elements of the Internet of FAIR Data and Services within the three pillars: GO Build (Technology), GO Change (Culture) and GO Train (Training).
- IMPLEMENT, clearly defined plans and deliverables to implement an element of the Internet of FAIR Data and Services (IFDS) within a defined time period.
- FOSTER, a community of harmonized FAIR practices.
- COMMUNICATE, together on critical issues on which consensus has been reached and which are of generic importance for the community.
How to join
- Answer to the FAIR Data Principles: The GO FAIR implementation plan for the Internet of FAIR Data and Services (IFDS) as a whole will answer to the FAIR Guiding Principles. This means that data resources, services, and training materials will be developed according to these principles and will be adorned with rich, machine-readable metadata, and that they will thus be Findable, Accessible, Interoperable, and Reusable under well-defined conditions, by machines and humans.
- Abide by the Governance Principles: A GO FAIR partner should formally acknowledge and endorse the general Governance principles of the GO FAIR initiative.
- Accept to be stakeholder-governed: The GO FAIR implementation approach for the IFDS is stakeholder-governed. A self-coordinating, board-governed organisation drawn from the stakeholder Implementation Network community creates trust that the organisation will take decisions driven by community consensus, considering different interests.
- Accept non-discriminatory membership: When willing to sign the Rules of Engagement, any stakeholder may express an interest in and should be welcome to join GO FAIR.
- Conduct transparent operations: Achieving trust in the selection of representatives in governance groups will be best achieved through transparent processes and operations in general (within the constraints of privacy laws).
- Not abuse its trusted provider or GO FAIR status for undue lobbying for its own services, especially with the aim to monopolise critical components of the IFDS.