Analyzing the language of rejection: a study of user flagging responses to hate speech on Reddit
DOI:
https://doi.org/10.47989/ir30iConf47191Keywords:
flagging messages, linguistic characteristics, natural language processingAbstract
Introduction. Online hate speech poses significant threats to individuals and society, exacerbating psychological harm, discrimination, and potential real-world violence. While automated detection models are available, their inability to recognize subtle variations of hate speech, particularly implicit forms, emphasizes the need for supplementary methods.
Method. This study investigates the potential of user-written flagging messages in enhancing hate speech detection, focusing on the characteristics and identification of flagging messages. We created a dataset of flagging messages and the comments they respond to, employing transformer-based models (BERT, RoBERTa, ALBERT, DistilBERT, and XLNet) for classification.
Analysis. Linguistic analysis using SEANCE and Named Entity Recognition was conducted to reveal unique characteristics of flagging messages.
Results. Our findings show that BERT and DistilBERT models achieved the highest accuracy in classifying flagging messages, with distinct linguistic patterns emerging in flagging content.
Conclusions. This research contributes to the development of more nuanced hate speech detection methods by leveraging user-generated flagging content. These findings have implications for improving automated content moderation systems and supporting more inclusive online environments. Future work will focus on the effectiveness of flagging messages in identifying implicit hate speech across diverse cultural contexts.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Sharon Lisseth Perez, Xiaoying Song, Lingzi Hong

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
