Automatically assuring Data Quality aspects in digital research metadata
Intelligent Information Management
Dipl.-Inf. André Langer
Prof. Dr.-Ing. Martin Gaedke
Providing collected research data sets for future reuse is good scientific practise (Open Research). However, assuring data quality for published research data and meta data in data repositories is not trivial and often involves human interaction, tedious reviews, or is not done at all.
This project therefore focuses on data quality metrics for published research data sets. Related to the FAIR principles, relevant data quality criteria for the description of published research data sets have to be identified in a first step, thereby focusing on meta data descriptions for research data sets of arbitrary format, size and shape. In a second step, an assessment has to be done, which of these criteria can be assured in an automated fashion. The concept can focus on available meta data descriptions in a Linked Data RDF serialization format but should consider a strategy if other meta description formats are used instead. Corresponding metrics have to be implemented as a proof of concept and applied in a data repository checkin workflow. An evaluation has to assure the correctness and benefit of the approach based on existing research data registries such as OpenAIRE, re3data or the Google Dataset search.