Vlaanderen.be

Onderzoeksoutput

Whip: Communicate and Test What to Expect from Data

Onderzoeksoutput: Bijdrage aan tijdschriftA2: Artikel in een tijdschrift met peer review, dat niet inbegrepen is in A1Onderzoek

Standard

Whip: Communicate and Test What to Expect from Data. / Hoey, Stijn Van; Desmet, Peter.

In: Biodiversity Information Science and Standards, Vol. 2, 18.05.2018, blz. e25317.

Onderzoeksoutput: Bijdrage aan tijdschriftA2: Artikel in een tijdschrift met peer review, dat niet inbegrepen is in A1Onderzoek

Harvard

APA

Author

Bibtex

@article{cb16cf6c2a6741afb3e9eda36db8744d,
title = "Whip: Communicate and Test What to Expect from Data",
abstract = "The ability to communicate and assess the quality and fitness for use of data is crucial to ensure maximum utility and re-use. Data consumers have certain requirements for the data they seek and need to be able to check if a data set conforms with these requirements. Data publishers aim to provide data with the highest possible quality and need to be able to identify potential errors that can be addressed with the available information at hand. The development and adoption of data publication guidelines is one approach to define and meet those requirements. However, the use of a guideline, the mapping decisions, and the requirements a dataset is expected to meet, are generally not communicated with the provided data. Moreover, these guidelines are typically intended for humans only. In this talk, we will present 'whip': a proposed syntax for data specifications. With whip, one can define column-based constraints for tabular (tidy) data using a number of rules, e.g. how data is structured following Darwin Core, how a term uses controlled vocabulary values, or what the expected minimum and maximum values are. These rules are human- and machine-readable, which communicates the specifications, and allows to automatically validate those in pipelines for data publication and quality assessment, such as Kurator. Whip can be formatted as a (yaml) text file that can be provided with the published data, communicating the specifications a dataset is expected to meet. The scope of these specifications can be specific to a dataset, but can also be used to express expected data quality and fitness for use of a publisher, consumer or community, allowing bottom-up and top-down adoption. As such, these specifications are complementary to the core set of data quality tests as currently under development by the TDWG Biodiversity Data Quality Task 2 Group 2. Whip rules are currently generic, but more specific ones can be defined to address requirements for biodiversity information.",
author = "Hoey, {Stijn Van} and Peter Desmet",
year = "2018",
month = "5",
day = "18",
doi = "10.3897/biss.2.25317",
language = "English",
volume = "2",
pages = "e25317",
journal = "Biodiversity Information Science and Standards",
publisher = "Pensoft Publishers",

}

RIS

TY - JOUR

T1 - Whip: Communicate and Test What to Expect from Data

AU - Hoey, Stijn Van

AU - Desmet, Peter

PY - 2018/5/18

Y1 - 2018/5/18

N2 - The ability to communicate and assess the quality and fitness for use of data is crucial to ensure maximum utility and re-use. Data consumers have certain requirements for the data they seek and need to be able to check if a data set conforms with these requirements. Data publishers aim to provide data with the highest possible quality and need to be able to identify potential errors that can be addressed with the available information at hand. The development and adoption of data publication guidelines is one approach to define and meet those requirements. However, the use of a guideline, the mapping decisions, and the requirements a dataset is expected to meet, are generally not communicated with the provided data. Moreover, these guidelines are typically intended for humans only. In this talk, we will present 'whip': a proposed syntax for data specifications. With whip, one can define column-based constraints for tabular (tidy) data using a number of rules, e.g. how data is structured following Darwin Core, how a term uses controlled vocabulary values, or what the expected minimum and maximum values are. These rules are human- and machine-readable, which communicates the specifications, and allows to automatically validate those in pipelines for data publication and quality assessment, such as Kurator. Whip can be formatted as a (yaml) text file that can be provided with the published data, communicating the specifications a dataset is expected to meet. The scope of these specifications can be specific to a dataset, but can also be used to express expected data quality and fitness for use of a publisher, consumer or community, allowing bottom-up and top-down adoption. As such, these specifications are complementary to the core set of data quality tests as currently under development by the TDWG Biodiversity Data Quality Task 2 Group 2. Whip rules are currently generic, but more specific ones can be defined to address requirements for biodiversity information.

AB - The ability to communicate and assess the quality and fitness for use of data is crucial to ensure maximum utility and re-use. Data consumers have certain requirements for the data they seek and need to be able to check if a data set conforms with these requirements. Data publishers aim to provide data with the highest possible quality and need to be able to identify potential errors that can be addressed with the available information at hand. The development and adoption of data publication guidelines is one approach to define and meet those requirements. However, the use of a guideline, the mapping decisions, and the requirements a dataset is expected to meet, are generally not communicated with the provided data. Moreover, these guidelines are typically intended for humans only. In this talk, we will present 'whip': a proposed syntax for data specifications. With whip, one can define column-based constraints for tabular (tidy) data using a number of rules, e.g. how data is structured following Darwin Core, how a term uses controlled vocabulary values, or what the expected minimum and maximum values are. These rules are human- and machine-readable, which communicates the specifications, and allows to automatically validate those in pipelines for data publication and quality assessment, such as Kurator. Whip can be formatted as a (yaml) text file that can be provided with the published data, communicating the specifications a dataset is expected to meet. The scope of these specifications can be specific to a dataset, but can also be used to express expected data quality and fitness for use of a publisher, consumer or community, allowing bottom-up and top-down adoption. As such, these specifications are complementary to the core set of data quality tests as currently under development by the TDWG Biodiversity Data Quality Task 2 Group 2. Whip rules are currently generic, but more specific ones can be defined to address requirements for biodiversity information.

U2 - 10.3897/biss.2.25317

DO - 10.3897/biss.2.25317

M3 - A2: Article in a journal with peer review, not included in A1

VL - 2

SP - e25317

JO - Biodiversity Information Science and Standards

JF - Biodiversity Information Science and Standards

ER -

Onderzoeksoutput (gerelateerd via auteurs)
Projecten (gerelateerd via auteurs)
Winkelwagen
Toevoegen aan winkelwagen Opgeslagen in winkelwagen

Kopieer de tekst uit dit veld...

Documenten

Documenten

DOI

Relaties
Bekijk grafiek van relaties