Jump to main content Hotkeys
Distributed and Self-organizing Systems
Distributed and Self-organizing Systems

Masterarbeit

Building a Web Crawler to collect a dataset of websites with chatbots
Building a Web Crawler to collect a dataset of websites with chatbots

Research Area

Web Engineering

Advisers

traubinger

gaedke

Description

Chatbots are an upcoming technology which is more often used in the last few years to interact with users on websites. The research on chatbots in the domain of human-computer interaction is currently spread over several areas, both in the graphical and conversational side of the interface. As of today, there is no dataset of websites with chatbots available which could be used as a common basis for research "in the wild". Most datasets either concentrate on a specific way of chatbot integration or include conversations from a specific chatbot itself. Thus there is a need for a holistic dataset of websites with chatbots.

The topic is to research on web crawlers before building and evaluating a web crawler which can solve the above mentioned problem. The crawler should find chatbots including the following types (and their variations): implemented by different third-party platforms, developed and included in the website itself, websites with an API to generative AI chatbots. The following chatbots can be included but are not a focus: chatbots on social media spaces or social networks, generative AI websites like Bard or ChatGPT. The thesis must include a state of the art on the current technology of web crawlers and available chatbot datasets, which has to be evaluated according to previous elicitated requirements. Out of this evaluation a concept for the webcrawler has to be designed and implemented. The fesibility of the web crawler has to be evaluated according to the number of found instances, the inclusion of the before mentioned chatbot types and the width of the dataset. It should also be tested against currently available webcrawlers which are going in a similar direction.


Powered by DGS
Edit list (authentication required)

Press Articles