Taking disagreements into consideration: human annotation variability in privacy policy analysis
DOI:
https://doi.org/10.47989/ir30iConf47581Keywords:
privacy policy, annotator disagreement, natural language processing, machine learning, human label variationAbstract
Introduction. Privacy policies inform users about data practices but are often complex and difficult to interpret. Human annotation plays a key role in understanding privacy policies, yet annotation disagreements highlight the complexity of these texts. Traditional machine learning models prioritize consensus, overlooking annotation variability and its impact on accuracy.
Method. This study examines how annotation disagreements affect machine learning performance using the OPP-115 corpus. It compares majority vote and union methods with alternative strategies to assess their impact on policy classification.
Analysis. The study evaluates whether increasing annotator consensus improves model effectiveness and if disagreement-aware approaches yield more reliable results.
Results. Higher agreement levels improve model performance across most categories. Complete agreement yields the best F1-scores, especially for First Party Collection/Use and Third-Party Sharing/Collection. Annotation disagreements significantly impact classification outcomes, underscoring the need for understanding annotation disagreements.
Conclusions. Ignoring annotation disagreements can misrepresent model accuracy. This study proposes new evaluation strategies that account for annotation variability, offering a more realistic approach to privacy policy analysis. Future work should explore the causes of annotation disagreements to improve machine learning transparency and reliability.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Tian Wang, Yuanye Ma, Catherine Blake, Masooda Bashir

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.