VISH: Does Your Smart Home Dialogue System Also Need Training Data?
Web Engineering. ICWE 2020. Lecture Notes in Computer Science, vol 12128
The main objective of smart homes is to improve the quality of life and comfort of their inhabitants through automation systems and ambient intelligence. Voice-based interaction like dialogue systems is the current emerging trend in these systems. Natural Language Understanding (NLU) model can identify the end-users’ intentions in the utterances provided to spoken dialogue systems. The utility of dialogue systems is reliant on the quality of NLU models, which is in turn significantly dependent on the availability of a high-quality and sufficiently large corpus for training, containing diverse utterance structures. However, building such corpora is a complex task even for companies possessing significant human and infrastructure resources. On the other hand, the existing corpora for the smart home domain are either concerned with web services, focus on direct goals only, follow static command structure, or are not publicly available in English language which limits the development of goal-oriented dialogue systems for smart homes. In this paper, we propose a generic method to create training data for the NLU component using a generative grammar-based approach. Our method outputs, Voice Interaction in Smart Home (VISH) dataset consisting of five million unique utterances for the smart home. This dataset can greatly facilitate research in the area of voice-based dialogue systems for smart homes. We evaluate the approach by using VISH to train several state-of-the-art NLU models. Our experiment results demonstrate the capability of the corpus to support the development of goal-oriented voice-based dialogue systems in the context of smart homes.
Noura, Mahda; Heil, Sebastian; Gaedke, Martin: VISH: Does Your Smart Home Dialogue System Also Need Training Data?. Web Engineering. ICWE 2020. Lecture Notes in Computer Science, vol 12128, pp. 171--187, 2020.