Jump to main content Jump to navigation Jump to search Jump to footer
Jump to main content
Distributed and Self-organizing Systems
Resource-Constrained SPARQL Query Generation from Natural Language
Resource-Constrained SPARQL Query Generation from Natural Language | Distributed and Self-organizing Systems
 

Masterarbeit

Resource-Constrained SPARQL Query Generation from Natural Language

Research Area

Intelligent Information Management / Web Engineering

Students

Advisers

Description

Knowledge graphs provide a way to organize data in a structured manner with rich semantic annotations that capture relations between specific data points. Querying, analyzing, and processing the data requires the use of the query language SPARQL, which is designed for pattern matching on RDF-based graphs. However, writing syntactically and semantically correct SPARQL queries requires knowledge of SPARQL’s syntax and semantics and an understanding of the ontology that organizes the data.

Recently, LLMs have been introduced as a way to generate SPARQL queries from natural language, significantly reducing the barrier of entry and complexity required to use knowledge graphs efficiently, as users can explore the contained information and formulate questions about it using natural language. This process is referred to as knowledge graph question answering (KGQA). According to benchmarks, the resulting answers are typically accurate using existing approaches; however, smaller language models perform worse in this task. Certain institutions that depend on knowledge graphs may not possess the required computational capabilities or resources to deploy and integrate high-parameter-count language models. Due to these constraints, such institutions will not be able to benefit from recent progress in KGQA.

This master’s thesis explores hybrid approaches for effective KGQA that combine rule- and embedding-based methods with LLMs. These approaches enable the deployment of systems in environments with limited resources. Thus, the objective of this thesis is to first clarify the challenges related to KGQA in low-resource environments. Then, a review of existing research and methods related to translating natural language input into SPARQL should be conducted, focusing on a resource-constrained context. Afterwards, a concept should be developed that addresses the previously investigated challenges, and a prototypical solution should be implemented to showcase the concept’s feasibility. Finally, the correctness of the generated SPARQL queries should be evaluated through a structured assessment using the DBpedia endpoint.