Defining a LD Metadata Profile for the Fine-Grained Description of Scientific Data Sets
The digitalization of science has also led to new possibilities to collaboratively use existing research data (Open Research / Open Data). Projects such as Googles Dataset Search or Re3Data.org show, that there are already many data publication platforms that provide datasets with accurate meta descriptions. However, to find datasets for a certain research problem is still difficult, especially in an interdisciplinary context. One reason might be, that the provided information primarily focuses on discovery metadata, thus provenance information, and less on structured data describing the content of the dataset in a homogeneous fashion. A collection of possible meta data properties for research data recently started via schema.org/Dataset, but it is assumed that this vocabulary can still be improved.
This project focuses on metadata standards to describe scientific datasets for an interdisciplinary publication and retrieval purpose. After defining the term (research) dataset and requirements on a dataset publication, a State of the Art analysis has to be run to identify existing vocabularies to describe with Linked Data the domain, nature and content of a dataset from multiple perspectives. Then, an approach has to be developed on how to compare the most-relevant concepts in these voabularies and to extract properties that better annotate datasets for an interdisciplinary reuse. With this knowledge, a proposition for a metadata profile can be made how researchers should annotate their datasets in the future. An evaluation has to show the practical usage and the improvement over currently proposed vocabularies.