Masterarbeit
Multi-Agent Template Filling using Large Language Models: Design and Comparative Evaluation with Speech Input Case Study
Completion
2025/12
Research Area
Students
Huzefa Ismail Jadliwala
Advisers
Verena Traubinger M.Sc.
Dr.-Ing. Sebastian Heil
Description
In domains such as healthcare, manufacturing, and administration, converting unstructured spoken language into structured forms is a critical yet challenging task. Traditional approaches using large language models (LLMs) typically employ monolithic systems that attempt to complete the entire template filling process in a single step. While computationally efficient, these systems are often brittle: they struggle with incomplete or ambiguous inputs, lack modularity for domain adaptation, and provide limited transparency when dealing with domain-specific terminology, noisy transcriptions, or real-world operational conditions.
This thesis proposes a modular, multi-agent approach to template filling, implemented through the Invox system. Instead of treating the problem as a single LLM task, Invox decomposes it into subtasks—transcription interpretation, field mapping, value inference, and result verification—each handled by dedicated LLM agents. The system leverages state-of-the-art components such as Whisper for transcription, GPT-4 and Claude for reasoning, and DeepSeek for semantic validation. Five architectural strategies are explored: (1) Single-Pass Full Input, (2) Iterative Single-Field Processing, (3) Multi-LLM Consensus (Full), (4) Multi-LLM Consensus (Iterative), and (5) Hybrid Refinement. These differ in terms of prompt structure, processing granularity, and verification mechanisms. All approaches are evaluated using the same criteria: accuracy, consistency, latency, cost-efficiency, and modularity. Datasets include the benchmark MUC-4 corpus and a real-world industrial dataset from steel manufacturing shift reports. The goal is not only to assess individual method performance, but to better understand the trade-offs introduced by modular, agent-based LLM systems in real-world deployment contexts.
The objective of this thesis is the creation of a solution or the combination of existing approaches to solve the problem described above of filling out templates with the help of Large Language Models. This comprises the following parts. An analysis of the state of the art on LLMs and prompt engineering, multi-agent systems, tools for filling out templates, and other relevant work. The thesis includes the implementation and comparison of five different approaches for a solution. A suitable evaluation should be conducted, where the approaches are tested on datasets regarding a set of benchmarks and the elicited requirements based on the literature research.


