Web User Interface as a Message: Power Law for Fraud Detection in Crowdsourced Labeling
Proceedings of the 21th International Conference on Web Engineering
Web Engineering becomes increasingly hungry for training data, as the application of machine learning (ML) methods in the field intensifies. Human-labeled datasets are particularly indispensable for ML-based validation and design of user interfaces (UIs). The production of such datasets is often outsourced to crowdworkers, who typically have lower motivation and payment compared to in-house staff, so the quality of their work becomes the paramount concern. In our paper, we explore the applicability of the trending fraud detection approach based on fit to power law in crowdsourced web UI labeling. On Amazon Mechanical Turk, 298 crowdworkers labeled over 30,000 UI elements in about 500 university homepage screenshots. We found a significant correlation between workers’ precisions and Kolmogorov-Smirnov statistics-based goodness-of-fit between the frequencies of UI elements in a worker’s output and power law. The obtained R2 = 0.504 was higher than the R2 = 0.432 baseline for the popular time-on-task parameter. Moreover, the distribution of UI elements’ frequencies is much less prone to manipulation by malicious crowdworkers, which is advantageous as a crowdsourced data quality control measure. The findings of our study suggest a certain resemblance between web UIs and natural language texts, in which word frequencies are known to comply with Zipf’s law.
Heil, Sebastian; Bakaev, Maxim; Gaedke, Martin: Web User Interface as a Message: Power Law for Fraud Detection in Crowdsourced Labeling. Proceedings of the 21th International Conference on Web Engineering, 2021.