Automated Annotation of Computer Science Research Data using Taxonomies
Intelligent Information Management
Christoph Göpfert M.Sc.
Prof. Dr.-Ing. Martin Gaedke
Open Science encourages the transparent dissemination of research data. Research data repositories are providing the essential infrastructure for archiving and sharing data in a manner facilitating reuse. However, the increasing amount of research data is making it increasingly difficult for researchers to find relevant data that can be reused, as the sheer amount of data can be overwhelming and tedious to navigate.
A potential solution for facilitating the discovery of research data after publication is through automated annotation with semantic concepts originating from curated taxonomies. These concepts can be used to describe key characteristics of the research data in a structured, machine-readable way. The initial data used for automated annotation may include the dataset's title and description, as well as other meta information related to the research data.
The objective of this master thesis is to develop such an automatic annotation approach utilizing curated taxonomies of the computer science domain. The purpose of the approach is to improve metadata descriptions according to the FAIR principles (Findable, Accessible, Interoperable, Reusable). For this purpose, the first step is to conduct a requirements analysis and investigate the state of the art in the field of automated metadata annotation. Existing solutions have to be classified and evaluated according to the previously identified requirements. Finally, an approach for automated annotation of research data utilizing semantic concepts from taxonomies of the computer science domain has to be designed, implemented, and evaluated based on the requirements.