<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "http://jats.nlm.nih.gov/publishing/1.0/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" article-type="research-article" xml:lang="en">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">IR</journal-id>
<journal-title-group>
<journal-title>Information Research</journal-title>
</journal-title-group>
<issn pub-type="epub">1368-1613</issn>
<publisher>
<publisher-name>University of Bor&#x00E5;s</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">ir30iConf47191</article-id>
<article-id pub-id-type="doi">10.47989/ir30iConf47191</article-id>
<article-categories>
<subj-group xml:lang="en">
<subject>Research article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Analyzing the language of rejection: a study of user flagging responses to hate speech on Reddit</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author"><name><surname>Perez</surname><given-names>Sharon Lisseth</given-names></name>
<xref ref-type="aff" rid="aff0001"/></contrib>
<contrib contrib-type="author"><name><surname>Song</surname><given-names>Xiaoying</given-names></name>
<xref ref-type="aff" rid="aff0002"/></contrib>
<contrib contrib-type="author"><name><surname>Hong</surname><given-names>Lingzi</given-names></name>
<xref ref-type="aff" rid="aff0003"/></contrib>
<aff id="aff0001"><bold>Sharon Lisseth Perez</bold> is a PhD student in the College of Information at the University of North Texas, Texas, USA. Her research interests are on natural language processing, artificial intelligence, and their applications in education and social justice. She can be reached via email at <email xlink:href="sharonperez@my.unt.edu">sharonperez@my.unt.edu</email></aff>
<aff id="aff0002"><bold>Xiaoying Song</bold> is a PhD student in College of Information, University of North Texas, Texas, USA. Her research focuses on evaluating generative counter-speech and addressing health misinformation. She can be reached via email at <email xlink:href="xiaoyingsong@my.unt.edu">xiaoyingsong@my.unt.edu</email></aff>
<aff id="aff0003"><bold>Lingzi Hong</bold> is an Assistant Professor of Data Science in the College of Information at the University of North Texas. She holds a Ph.D. in Information Science from the University of Maryland, College Park. Her research focuses on human-centered computing and artificial intelligence. She can be reached at <email xlink:href="lingzi.hong@unt.edu">lingzi.hong@unt.edu</email></aff>
</contrib-group>
<pub-date pub-type="epub"><day>06</day><month>05</month><year>2025</year></pub-date>
<pub-date pub-type="collection"><year>2025</year></pub-date>
<volume>30</volume>
<issue>i</issue>
<fpage>815</fpage>
<lpage>823</lpage>
<permissions>
<copyright-year>2025</copyright-year>
<copyright-holder>&#x00A9; 2025 The Author(s).</copyright-holder>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by-nc/4.0/">
<license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by-nc/4.0/">http://creativecommons.org/licenses/by-nc/4.0/</ext-link>), permitting all non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract xml:lang="en">
<title>Abstract</title>
<p><bold>Introduction</bold>. Online hate speech poses significant threats to individuals and society, exacerbating psychological harm, discrimination, and potential real-world violence. While automated detection models are available, their inability to recognize subtle variations of hate speech, particularly implicit forms, emphasizes the need for supplementary methods.</p>
<p><bold>Method</bold>. This study investigates the potential of user-written flagging messages in enhancing hate speech detection, focusing on the characteristics and identification of flagging messages. We created a dataset of flagging messages and the comments they respond to, employing transformer-based models (BERT, RoBERTa, ALBERT, DistilBERT, and XLNet) for classification.</p>
<p><bold>Analysis</bold>. Linguistic analysis using SEANCE and Named Entity Recognition was conducted to reveal unique characteristics of flagging messages.</p>
<p><bold>Results</bold>. Our findings show that BERT and DistilBERT models achieved the highest accuracy in classifying flagging messages, with distinct linguistic patterns emerging in flagging content.</p>
<p><bold>Conclusion.</bold> This research contributes to the development of more nuanced hate speech detection methods by leveraging user-generated flagging content. These findings have implications for improving automated content moderation systems and supporting more inclusive online environments. Future work will focus on the effectiveness of flagging messages in identifying implicit hate speech across diverse cultural contexts. </p>
</abstract>
</article-meta>
</front>
<body>
<sec id="sec1">
<title>Introduction</title>
<p>Online hate speech poses significant threats to individuals and communities, exacerbating psychological harm, discrimination, and potential real-world violence. Extensive research has been dedicated to identifying and addressing this issue, with various studies utilizing automated methods, including natural language processing techniques, to detect and categorize hate speech on social media platforms (<xref rid="R16" ref-type="bibr">Salminen et al., 2020</xref>, <xref rid="R21" ref-type="bibr">Yu et al., 2022</xref>).</p>
<p>However, current detection methods primarily focus on explicit forms of hate, often overlooking more subtle manifestations and evolving abusive language. This limitation highlights the need for more nuanced approaches that can capture the complex dynamics of online hate. One potential source of insight that has been under-explored is user-written flagging messages.</p>
<p>Flagging is widely employed by users across social media platforms to report offensive content, as shown in <xref ref-type="fig" rid="F1">Figure 1</xref>, and it provides moderators with a rhetorical defence for content removal decisions (<xref rid="R3" ref-type="bibr">Crawford and Gillespie, 2016</xref>). In an environment where control and transparency are limited, flags are crucial in giving users a voice (Zhang et al., 2023). Chandrasekharan et al. (2018) found that online posts flagged by regular users and later reviewed by moderators were important in determining the best ways to intervene.</p>
<p>This study aims to identify flagging messages, laying a foundation for the empirical investigation of flagging messages in enhancing hate speech detection, particularly for subtle forms of abuse or microaggressions (<xref rid="R5" ref-type="bibr">De la Pe&#x00F1;a Sarrac&#x00E9;n and Rosso, 2023</xref>; <xref rid="R13" ref-type="bibr">MacAvaney et al., 2019</xref>). Specifically, we address the following research questions:
<list list-type="order">
<list-item><p>What are the linguistic features that distinguish flagging from non-flagging messages?</p></list-item>
<list-item><p>What NLP models are most effective in accurately detecting flagged content?</p></list-item>
<list-item><p>To what extent can flagging messages aid in identifying implicit hate, and what insights can they provide into this form of hate expression?</p></list-item>
</list></p>
<p>We aim to provide insights that can inform the development of more comprehensive and nuanced hate speech detection models, ultimately contributing to safer and more inclusive online environments.</p>
<fig id="F1">
<label>Figure 1.</label>
<caption><p>Online users post flagging messages to warn others. These comments indicate the comment they respond to is harmful speech</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="images\c69-fig1.jpg"><alt-text>none</alt-text></graphic>
</fig>
</sec>
<sec id="sec2">
<title>Related work</title>
<sec id="sec2_1">
<title>Implicit hate speech detection</title>
<p>Detecting implicit forms of hate speech poses significant challenges due to linguistic nuances and contextual dependencies. ElSherief et al. (2021) presented a comprehensive benchmark dataset aimed at understanding and detecting implicit hate speech, providing a foundation for evaluating detection algorithms. Ghosh et al. (2023) proposed CoSyn, a context-synergized hyperbolic network that considers both personal and dialogue context in conversation trees. Their approach highlights the importance of contextual information in this task. Jafari et al. (<xref rid="R8" ref-type="bibr">2023</xref>) investigated the influence of fine-grained emotions on implicit hate speech classification. Their research employed single task learning and multi-task learning models and revealed that integrating emotional features improved hate speech detection. Recently, Ahn et al. (<xref rid="R1" ref-type="bibr">2024</xref>) introduced a SharedCon model that leverages shared semantic information to enhance the detection of implicit hate speech.</p>
<p>These works collectively advance the understanding of implicit hate speech. However, as hate language evolves, the curated implicit hate speech datasets could slowly become obsolete, affecting detection accuracy. We propose a detection method that is anchored in flagging messages, which are less likely to change due to the lack of motivation for alteration among users.</p>
</sec>
</sec>
<sec id="sec3">
<title>Method</title>
<sec id="sec3_1">
<title>Data collection and processing</title>
<p>We started by compiling a small selection of comments from Reddit discussion threads. We collected a total of 1,050 pairs of hate speech comments and their corresponding replies from 39 subreddits that have been identified as hateful (<xref rid="R18" ref-type="bibr">Vidgen et al., 2021</xref>). Examples of these subreddits include r/TwoXChromosomes, r/TrueReddit, and r/ChangeMyView. We used Reddit&#x2019;s API to collect the comments while ensuring that we followed the platform&#x2019;s terms of service.</p>
<p>Our research team manually examined and labeled each reply message. We utilized a binary (0,1) coding system to categorize the messages based on their content, purpose, tone, effectiveness, and rejection in addressing hate speech. Inter-rater agreement was calculated to ensure the reliability of the labeling process. The obtained results are as follows: Cohen&#x2019;s kappa: 0.81 and Krippendorff&#x2019;s alpha: 0.82. A Kappa coefficient between 0.61 and 0.80 indicates substantial agreement (<xref rid="R19" ref-type="bibr">Viera et al., 2005</xref>), demonstrating the reliability of our labels.</p>
</sec>
<sec id="sec3_2">
<title>Data analysis</title>
<p>We utilize the SEANCE tool (Sentiment Analysis and Cognition Engine) to perform in-depth sentiment, social cognition, and social order analysis on textual data. Its comprehensive set of indices and component scores allows for a nuanced and insightful analysis of language use, emotions, and social dynamics (<xref rid="R4" ref-type="bibr">Crossley et al., 2017</xref>). In addition to the SEANCE analysis, we conducted a Named Entity Recognition (NER) using Spacy (<xref rid="R14" ref-type="bibr">Naseer et al., 2021</xref>).</p>
<p>We use the Wilcoxon Rank Sum Test to conduct statistical tests on SEANCE and NER features between the flagging and non-flagging messages. Then Bonferroni correction is conducted to identify the most significant features.</p>
</sec>
<sec id="sec3_3">
<title>Classification experiments</title>
<p><bold>We utilize several state-of-the-art transformer-based models, including BERT (<xref rid="R9" ref-type="bibr">Kenton and Toutanova, 2019</xref>), RoBERTa (<xref rid="R20" ref-type="bibr">Liu, 2019</xref>), ALBERT (<xref rid="R10" ref-type="bibr">Lan, 2019</xref>), DistilBERT (Sanh, 2019), and XLNet (<xref rid="R20" ref-type="bibr">Yang, 2019</xref>) to classify a comment to be flagging or not.</bold></p>
<p>The labeled data was split into training and testing sets using an 8:2 ratio, ensuring class balance by stratifying based on the &#x2019;label.&#x2019; As the labels of &#x2019;flagging&#x2019; and &#x2019;non-flagging&#x2019; are unbalanced, we experiment with sampling and data augmentation methods to investigate whether these techniques will benefit the prediction performance. We first use undersampling, which involves reducing the number of records in the majority class. Random undersampling has been empirically shown to be one of the most effective resampling methods. Few of the more sophisticated undersampling methods have outperformed random undersampling in empirical studies (<xref rid="R11" ref-type="bibr">Liu, 2004</xref>).</p>
<p>We then expand our dataset to evaluate whether implementing NLTK tools such as punkt, averaged_perceptron_tagger, and wordnet would enhance our models&#x2019; performance. Punkt is a tokenizer that uses an unsupervised algorithm to divide a text into a list of sentences, taking into consideration abbreviation words, collocations, and words that start sentences (<xref rid="R15" ref-type="bibr">Natural Language Toolkit, n.d.</xref>).</p>
<p>The models were trained using the PyTorch framework and the Hugging Face Transformers library. Model evaluation metrics included accuracy, precision, recall, and F1-score. We generated classification reports to provide detailed insights into model performance across different classes.</p>
</sec>
</sec>
<sec id="sec4">
<title>Results</title>
<p>In this section, we present detailed findings of our experiments, encompassing both the insights gained from linguistic analysis techniques and the performance of various transformer-based models.</p>
<sec id="sec4_1">
<title>Linguistic analysis</title>
<p>This analysis offers insights into the linguistic characteristics that differentiate flagging from non-flagging messages, providing a deeper understanding of their nature.</p>
<p>Our SEANCE analysis revealed several significant linguistic differences between flagging and non-flagging comments, with a particularly strong emphasis on negative emotion words in flagging content. Statistically significant differences (p &#x003C; 0.001) were observed in various measures. For example, anger (Lexicon) was considerably more prevalent in flagging comments, with a mean of 0.073 compared to 0.029 in non-flagging comments. Fear and disgust showed the largest difference, with means of 0.437 and 0.210 for flagging and non-flagging comments respectively. General negative emotions (Negative_EmoLex) were also more common in flagging comments (mean 0.104) compared to non-flagging ones (mean 0.052).</p>
<p>These findings indicate that flagging comments are characterized by a significantly higher emotional intensity, specifically negative emotions like anger, fear, and disgust. This is consistent with prior research on hate speech and offensive language, which frequently involve strong expressions of negative emotions (<xref rid="R21" ref-type="bibr">Yu et al., 2022</xref>). We also observed differences in other linguistic features. For example, the second-person pronouns (You_GI) were higher in flagging comments, potentially indicating more direct confrontational language. <xref ref-type="table" rid="T1">Table 1</xref> shows these findings. </p>
<table-wrap id="T1">
<label>Table 1.</label>
<caption><p>The most significant linguistic differences are grouped by variable category. Only variables that pass the Bonferroni correction are included</p></caption>
<table>
<thead>
<tr>
<th align="center" valign="top"><bold>Variable Category</bold></th>
<th align="center" valign="top"><italic><bold>Variable</bold></italic></th>
<th align="center" valign="top"><italic><bold>Mean (Flagging)</bold></italic></th>
<th align="center" valign="top"><italic><bold>Mean (Non-flagging)</bold></italic></th>
<th align="center" valign="top"><italic><bold>p-Value</bold></italic></th>
<th align="center" valign="top"><italic><bold>Bonferroni Correction</bold></italic></th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top" rowspan="10"><italic>Negative Emotion words</italic></td>
<td align="center" valign="top">Anger (Lexicon)</td>
<td align="center" valign="top">0.073</td>
<td align="center" valign="top">0.029</td>
<td align="center" valign="top">2.47E<sup>-41</sup></td>
<td align="center" valign="top">6.66E<sup>-39</sup></td>
</tr>
<tr>
<td align="center" valign="top">Fear and Disgust</td>
<td align="center" valign="top">0.437</td>
<td align="center" valign="top">0.210</td>
<td align="center" valign="top">6.30E<sup>-39</sup></td>
<td align="center" valign="top">1.70E<sup>-36</sup></td>
</tr>
<tr>
<td align="center" valign="top">Negative Emotions (Negative_EmoLex)</td>
<td align="center" valign="top">0.104</td>
<td align="center" valign="top">0.052</td>
<td align="center" valign="top">6.81E<sup>-38</sup></td>
<td align="center" valign="top">1.83E<sup>-35</sup></td>
</tr>
<tr>
<td align="center" valign="top">Fear (Lexicon)</td>
<td align="center" valign="top">0.072</td>
<td align="center" valign="top">0.029</td>
<td align="center" valign="top">7.45E<sup>-36</sup></td>
<td align="center" valign="top">2.01E<sup>-33</sup></td>
</tr>
<tr>
<td align="center" valign="top">Negative Adjectives</td>
<td align="center" valign="top">0.733</td>
<td align="center" valign="top">0.364</td>
<td align="center" valign="top">6.08E<sup>-10</sup></td>
<td align="center" valign="top">1.64E<sup>-07</sup></td>
</tr>
<tr>
<td align="center" valign="top">Negative Words</td>
<td align="center" valign="top">0.063</td>
<td align="center" valign="top">0.043</td>
<td align="center" valign="top">8.39E<sup>-10</sup></td>
<td align="center" valign="top">2.27E<sup>-07</sup></td>
</tr>
<tr>
<td align="center" valign="top">Negative Sentiment (VADER)</td>
<td align="center" valign="top">0.144</td>
<td align="center" valign="top">0.105</td>
<td align="center" valign="top">3.71E<sup>-09</sup></td>
<td align="center" valign="top">1.00E<sup>-06</sup></td>
</tr>
<tr>
<td align="center" valign="top">Hatred (Lexicon)</td>
<td align="center" valign="top">0.002</td>
<td align="center" valign="top">0.000</td>
<td align="center" valign="top">1.93E<sup>-05</sup></td>
<td align="center" valign="top">0.005</td>
</tr>
<tr>
<td align="center" valign="top">Disgust (Lexicon)</td>
<td align="center" valign="top">0.027</td>
<td align="center" valign="top">0.020</td>
<td align="center" valign="top">1.01E<sup>-04</sup></td>
<td align="center" valign="top">0.027</td>
</tr>
<tr>
<td align="center" valign="top">Negativity (Lexicon)</td>
<td align="center" valign="top">0.074</td>
<td align="center" valign="top">0.062</td>
<td align="center" valign="top">3.19E<sup>-04</sup></td>
<td align="center" valign="top">0.086</td>
</tr>
<tr>
<td align="center" valign="top"><italic>Reference</italic></td>
<td align="center" valign="top">Second-Person Pronouns (You_GI)</td>
<td align="center" valign="top">0.041</td>
<td align="center" valign="top">0.031</td>
<td align="center" valign="top">1.78E<sup>-06</sup></td>
<td align="center" valign="top">0.000</td>
</tr>
<tr>
<td align="center" valign="top" rowspan="2"><italic>Quality and quantity</italic></td>
<td align="center" valign="top">Numerical Mentions</td>
<td align="center" valign="top">0.004</td>
<td align="center" valign="top">0.009</td>
<td align="center" valign="top">9.64E<sup>-05</sup></td>
<td align="center" valign="top">0.026</td>
</tr>
<tr>
<td align="center" valign="top">Frequency References</td>
<td align="center" valign="top">0.010</td>
<td align="center" valign="top">0.005</td>
<td align="center" valign="top">3.06E<sup>-04</sup></td>
<td align="center" valign="top">0.083</td>
</tr>
<tr>
<td align="center" valign="top"><italic>Other affect</italic></td>
<td align="center" valign="top">Sentiment Compound (VADER)</td>
<td align="center" valign="top">-0.113</td>
<td align="center" valign="top">-0.005</td>
<td align="center" valign="top">1.02E<sup>-08</sup></td>
<td align="center" valign="top">2.75E<sup>-06</sup></td>
</tr>
<tr>
<td align="center" valign="top"><italic>Emotion words</italic></td>
<td align="center" valign="top">Positive Nouns</td>
<td align="center" valign="top">-0.177</td>
<td align="center" valign="top">-0.088</td>
<td align="center" valign="top">1.90E<sup>-07</sup></td>
<td align="center" valign="top">0.000</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="sec4_2">
<title>NER analysis</title>
<p>This analysis assists in uncovering contextual elements and specific entity types that might indicate potentially problematic content. The results show a statistically significant difference between PERSON and CARDINAL. The presence of person names (PERSON) is strongly associated with non-flagging comments. This suggests that comments mentioning specific individuals are less likely to be a flagging message. The higher presence of cardinal numbers (CARDINAL) in non-flagging comments indicates that more factual or quantitative content is less likely to be the flagging message. These results are displayed in <xref ref-type="table" rid="T2">Table 2</xref>.</p>
<p>GPE (Geo-Political Entities) tends to appear more often in non-flagging comments. The trend with GPEs, while not significant after correction, hints that mentions of geographical or political entities might be more common in non-flagging comments.</p>
<table-wrap id="T2">
<label>Table 2.</label>
<caption><p>Entities with more prevalent differences. Only variables that pass the Bonferroni correction are included</p></caption>
<table>
<thead>
<tr>
<th align="center" valign="top"><bold>Variable</bold></th>
<th align="center" valign="top"><italic><bold>Mean (Flagging)</bold></italic></th>
<th align="center" valign="top"><italic><bold>Mean (Non-flagging)</bold></italic></th>
<th align="center" valign="top"><italic><bold>p-value</bold></italic></th>
<th align="center" valign="top"><italic><bold>Bonferroni correction</bold></italic></th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top"><italic>PERSON</italic></td>
<td align="center" valign="top">0.039</td>
<td align="center" valign="top">0.095</td>
<td align="center" valign="top">0.000</td>
<td align="center" valign="top">0.002</td>
</tr>
<tr>
<td align="center" valign="top"><italic>CARDINAL</italic></td>
<td align="center" valign="top">0.043</td>
<td align="center" valign="top">0.080</td>
<td align="center" valign="top">0.001</td>
<td align="center" valign="top">0.011</td>
</tr>
<tr>
<td align="center" valign="top"><italic>GPE</italic></td>
<td align="center" valign="top">0.038</td>
<td align="center" valign="top">0.064</td>
<td align="center" valign="top">0.015</td>
<td align="center" valign="top">0.212</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="sec4_3">
<title>Model performance</title>
<p>We examine the effectiveness of BERT, RoBERTa, ALBERT, DistilBERT, and XLNet models. We conduct experiments using various input configurations, including replies to hate speech alone (Child), and a combination of hate speech (Parent) and replies to it. <xref ref-type="table" rid="T3">Table 3</xref> presents these results.</p>
<p>BERT and DistilBERT show the highest overall performance, both achieving an accuracy of 0.81 and balanced F1 scores across labels. BERT, using only Child, demonstrates the highest F1 score (0.82) for flagging comments.</p>
<p>There is a slight variation in how models handle different contexts (only Child vs. Parent and Child). Most models demonstrate balanced performance between labels of flagging and non-flagging. Including Parent as a context varies across models: BERT and XLNet perform best with only Child, suggesting they may be more sensitive to noise in Parent comments. In contrast, RoBERTa, ALBERT, and DistilBERT show strong performance with both Parent and Child, indicating better context integration capabilities.</p>
<p>DistilBERT&#x2019;s high performance is particularly noteworthy, as it is a compressed model designed for efficiency. The similar performance of BERT and DistilBERT suggests that DistilBERT might be more suitable in resource-constrained environments.</p>
<table-wrap id="T3">
<label>Table 3.</label>
<caption><p>Summary of best-performing model results, including the impact of different input configurations (Child only or Parent and child) on the model&#x2019;s performance</p></caption>
<table>
<thead>
<tr>
<th align="center" valign="top" rowspan="2"><bold>Model</bold></th>
<th align="center" valign="top" rowspan="2"><italic><bold>Input</bold></italic></th>
<th align="center" valign="top" colspan="3"><bold>Non-Flagging</bold></th>
<th align="center" valign="top" colspan="3"><bold>Flagging</bold></th>
<th align="center" valign="top" colspan="3"><bold>Weighted Average</bold></th>
<th align="center" valign="top" rowspan="2"><italic><bold>Accuracy</bold></italic></th>
</tr>
<tr>
<th align="center" valign="top"><bold>P</bold></th>
<th align="center" valign="top"><bold>R</bold></th>
<th align="center" valign="top"><bold>F1</bold></th>
<th align="center" valign="top"><bold>P</bold></th>
<th align="center" valign="top"><bold>R</bold></th>
<th align="center" valign="top"><bold>F1</bold></th>
<th align="center" valign="top"><bold>P</bold></th>
<th align="center" valign="top"><bold>R</bold></th>
<th align="center" valign="top"><bold>F1</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top"><italic>BERT</italic></td>
<td align="center" valign="top">Child only</td>
<td align="center" valign="top">0.84</td>
<td align="center" valign="top">0.76</td>
<td align="center" valign="top">0.80</td>
<td align="center" valign="top">0.78</td>
<td align="center" valign="top">0.85</td>
<td align="center" valign="top"><bold>0.82</bold></td>
<td align="center" valign="top">0.81</td>
<td align="center" valign="top">0.81</td>
<td align="center" valign="top"><bold>0.81</bold></td>
<td align="center" valign="top"><bold>0.81</bold></td>
</tr>
<tr>
<td align="center" valign="top"><italic>RoBERTa</italic></td>
<td align="center" valign="top">Parent and child</td>
<td align="center" valign="top">0.77</td>
<td align="center" valign="top">0.81</td>
<td align="center" valign="top">0.79</td>
<td align="center" valign="top">0.80</td>
<td align="center" valign="top">0.75</td>
<td align="center" valign="top">0.77</td>
<td align="center" valign="top">0.78</td>
<td align="center" valign="top">0.78</td>
<td align="center" valign="top">0.78</td>
<td align="center" valign="top">0.78</td>
</tr>
<tr>
<td align="center" valign="top"><italic>ALBERT</italic></td>
<td align="center" valign="top">Parent and child</td>
<td align="center" valign="top">0.82</td>
<td align="center" valign="top">0.77</td>
<td align="center" valign="top">0.79</td>
<td align="center" valign="top">0.78</td>
<td align="center" valign="top">0.83</td>
<td align="center" valign="top">0.80</td>
<td align="center" valign="top">0.80</td>
<td align="center" valign="top">0.80</td>
<td align="center" valign="top">0.80</td>
<td align="center" valign="top">0.80</td>
</tr>
<tr>
<td align="center" valign="top"><italic>DistilBERT</italic></td>
<td align="center" valign="top">Parent and child</td>
<td align="center" valign="top">0.82</td>
<td align="center" valign="top">0.80</td>
<td align="center" valign="top">0.81</td>
<td align="center" valign="top">0.80</td>
<td align="center" valign="top">0.82</td>
<td align="center" valign="top">0.81</td>
<td align="center" valign="top">0.81</td>
<td align="center" valign="top">0.81</td>
<td align="center" valign="top"><bold>0.81</bold></td>
<td align="center" valign="top"><bold>0.81</bold></td>
</tr>
<tr>
<td align="center" valign="top"><italic>XLNet</italic></td>
<td align="center" valign="top">Child only</td>
<td align="center" valign="top">0.80</td>
<td align="center" valign="top">0.77</td>
<td align="center" valign="top">0.79</td>
<td align="center" valign="top">0.78</td>
<td align="center" valign="top">0.81</td>
<td align="center" valign="top">0.79</td>
<td align="center" valign="top">0.79</td>
<td align="center" valign="top">0.79</td>
<td align="center" valign="top">0.79</td>
<td align="center" valign="top">0.79</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="sec5">
<title>Conclusion</title>
<p>Our study provides insights into the linguistic characteristics of flagging comments. Our study combining Linguistic and NER analysis, and transformer-based models, reveals several key findings:</p>
<p>The Linguistic analysis demonstrated a strong association between negative emotional language and flagging comments. The NER analysis revealed that certain types of named entities, particularly PERSON and CARDINAL, are differentially associated with flagging and non-flagging comments. These findings could be instrumental in identifying flagging messages, which could be used to develop more sophisticated content moderation tools. The evaluation of BERT, RoBERTa, ALBERT, DistilBERT, and XLNet models demonstrated their effectiveness in detecting flagging messages. The context should be carefully considered when designing flagging systems.</p>
</sec>
<sec id="sec6">
<title>Limitations and future directions</title>
<p>We have identified flagging messages; how these could enhance the detection of implicit or varied forms of hate speech will be further investigated in the next step, as the work is still in progress. While our study gives us valuable insights, it&#x2019;s important to recognize its limitations. The binary classification of comments as flagging or not may simplify the complex nature of online discourse. Future research will explore more granular categorizations, considering the nuances of different types and intensities of negative emotions and their relationship to various forms of online harm like implicit hate. The trending large language models may have better performance in identifying flagging messages, which will be experimented in the future.</p>
<p>This study contributes to the ongoing efforts to create safer and healthier online discussions and communities. Yet, the scope of our research can be expanded and address a broader range of online harm indicators.</p>
</sec>
</body>
<back>
<ack>
<title>Acknowledgements</title>
<p>The authors gratefully acknowledge the financial support from the Institute of Museum and Library Services under Grant LG-256661-OLS-24 and LG-256666-OLS-24. Sharon Lisseth Perez also acknowledges the support of the Fulbright Program for funding her PhD studies at the University of North Texas. The authors would like to thank the collaborators who contributed to this research.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="R1"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Ahn</surname><given-names>H.</given-names></name><name><surname>Kim</surname><given-names>Y.</given-names></name><name><surname>Kim</surname><given-names>J.</given-names></name><name><surname>Han</surname><given-names>Y. S.</given-names></name></person-group><year>2024</year><comment>August</comment><article-title>SharedCon: Implicit hate speech detection using shared semantics</article-title><source>Findings of the Association for Computational Linguistics ACL 2024</source><fpage>10444</fpage><lpage>10455</lpage><ext-link ext-link-type="uri" xlink:href="https://aclanthology.org/2024.findings-acl.622/">https://aclanthology.org/2024.findings-acl.622/</ext-link></element-citation></ref>
<ref id="R2"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chandrasekharan</surname><given-names>E.</given-names></name><name><surname>Samory</surname><given-names>M.</given-names></name><name><surname>Jhaver</surname><given-names>S.</given-names></name><name><surname>Charvat</surname><given-names>H.</given-names></name><name><surname>Bruckman</surname><given-names>A.</given-names></name><name><surname>Lampe</surname><given-names>C.</given-names></name><name><surname>Gilbert</surname><given-names>E.</given-names></name></person-group><year>2018</year><article-title>The Internet&#x2019;s hidden rules: An empirical study of Reddit norm violations at micro, meso, and macro scales</article-title><source>Proceedings of the ACM on Human-Computer Interaction</source><volume>2</volume><issue>CSCW</issue><fpage>1</fpage><lpage>25</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1145/3274301">https://doi.org/10.1145/3274301</ext-link></element-citation></ref>
<ref id="R3"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Crawford</surname><given-names>K.</given-names></name><name><surname>Gillespie</surname><given-names>T.</given-names></name></person-group><year>2016</year><article-title>What is a flag for? Social media reporting tools and the vocabulary of complaint</article-title><source>New Media &#x0026; Society</source><volume>18</volume><issue>3</issue><fpage>410</fpage><lpage>428</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1177/1461444814543163">https://doi.org/10.1177/1461444814543163</ext-link></element-citation></ref>
<ref id="R4"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Crossley</surname><given-names>S. A.</given-names></name><name><surname>Kyle</surname><given-names>K.</given-names></name><name><surname>McNamara</surname><given-names>D. S.</given-names></name></person-group><year>2017</year><article-title>Sentiment Analysis and Social Cognition Engine (SEANCE): An automatic tool for sentiment, social cognition, and social-order analysis</article-title><source>Behavior research methods</source><volume>49</volume><fpage>803</fpage><lpage>821</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3758/s13428-016-0743-z">https://doi.org/10.3758/s13428-016-0743-z</ext-link></element-citation></ref>
<ref id="R5"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>De la Pe&#x00F1;a Sarrac&#x00E9;n</surname><given-names>G. L.</given-names></name><name><surname>Rosso</surname><given-names>P.</given-names></name></person-group><year>2023</year><article-title>Systematic keyword and bias analyses in hate speech detection</article-title><source>Information Processing &#x0026; Management</source><volume>60</volume><issue>5</issue><fpage>103433</fpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.ipm.2023.103433">https://doi.org/10.1016/j.ipm.2023.103433</ext-link></element-citation></ref>
<ref id="R6"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>ElSherief</surname><given-names>M.</given-names></name><name><surname>Ziems</surname><given-names>C.</given-names></name><name><surname>Muchlinski</surname><given-names>D.</given-names></name><name><surname>Anupindi</surname><given-names>V.</given-names></name><name><surname>Seybolt</surname><given-names>J.</given-names></name><name><surname>De Choudhury</surname><given-names>M.</given-names></name><name><surname>Yang</surname><given-names>D.</given-names></name></person-group><year>2021</year><comment>November</comment><article-title>Latent Hatred: A Benchmark for Understanding Implicit Hate Speech</article-title><source>Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing</source><fpage>345</fpage><lpage>363</lpage><ext-link ext-link-type="uri" xlink:href="https://aclanthology.org/2021.emnlp-main.29/">https://aclanthology.org/2021.emnlp-main.29/</ext-link></element-citation></ref>
<ref id="R7"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Ghosh</surname><given-names>S.</given-names></name><name><surname>Suri</surname><given-names>M.</given-names></name><name><surname>Chiniya</surname><given-names>P.</given-names></name><name><surname>Tyagi</surname><given-names>U.</given-names></name><name><surname>Kumar</surname><given-names>S.</given-names></name><name><surname>Manocha</surname><given-names>D.</given-names></name></person-group><year>2023</year><article-title>CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a Context Synergized Hyperbolic Network</article-title><source>arXiv preprint arXiv:2303.03387</source><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.48550/arXiv.2303.03387">https://doi.org/10.48550/arXiv.2303.03387</ext-link></element-citation></ref>
<ref id="R8"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Jafari</surname><given-names>A. R.</given-names></name><name><surname>Li</surname><given-names>G.</given-names></name><name><surname>Rajapaksha</surname><given-names>P.</given-names></name><name><surname>Farahbakhsh</surname><given-names>R.</given-names></name><name><surname>Crespi</surname><given-names>N.</given-names></name></person-group><year>2023</year><article-title>Fine-grained emotions influence on implicit hate speech detection</article-title><source>IEEE Access</source><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1109/ACCESS.2023.3318863">https://doi.org/10.1109/ACCESS.2023.3318863</ext-link></element-citation></ref>
<ref id="R9"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kenton</surname><given-names>J. D. M. W. C.</given-names></name><name><surname>Toutanova</surname><given-names>L. K.</given-names></name></person-group><year>2019</year><comment>June</comment><article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title><source>Proceedings of naacL-HLT</source><volume>1</volume><fpage>2</fpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.48550/arXiv.1810.04805">https://doi.org/10.48550/arXiv.1810.04805</ext-link></element-citation></ref>
<ref id="R10"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Lan</surname><given-names>Z.</given-names></name></person-group><year>2019</year><article-title>Albert: A lite bert for self-supervised learning of language representations</article-title><source>arXiv preprint arXiv:1909.11942</source><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.48550/arXiv.1909.11942">https://doi.org/10.48550/arXiv.1909.11942</ext-link></element-citation></ref>
<ref id="R11"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Liu</surname><given-names>A. Y. C.</given-names></name></person-group><year>2004</year><article-title>The effect of oversampling and undersampling on classifying imbalanced text datasets</article-title><ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.26153/tsw/12300">http://dx.doi.org/10.26153/tsw/12300</ext-link></element-citation></ref>
<ref id="R12"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Liu</surname><given-names>Y.</given-names></name></person-group><year>2019</year><article-title>Roberta: A robustly optimized bert pretraining approach</article-title><source>arXiv preprint arXiv:1907.11692</source><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.48550/arXiv.1907.11692">https://doi.org/10.48550/arXiv.1907.11692</ext-link></element-citation></ref>
<ref id="R13"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>MacAvaney</surname><given-names>S.</given-names></name><name><surname>Yao</surname><given-names>H. R.</given-names></name><name><surname>Yang</surname><given-names>E.</given-names></name><name><surname>Russell</surname><given-names>K.</given-names></name><name><surname>Goharian</surname><given-names>N.</given-names></name><name><surname>Frieder</surname><given-names>O.</given-names></name></person-group><year>2019</year><article-title>Hate speech detection: Challenges and solutions</article-title><source>PloS one</source><volume>14</volume><issue>8</issue><fpage>e0221152</fpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1371/journal.pone.0221152">https://doi.org/10.1371/journal.pone.0221152</ext-link></element-citation></ref>
<ref id="R14"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Naseer</surname><given-names>S.</given-names></name><name><surname>Ghafoor</surname><given-names>M. M.</given-names></name><name><surname>bin Khalid Alvi</surname><given-names>S.</given-names></name><name><surname>Kiran</surname><given-names>A.</given-names></name><name><surname>Rahmand</surname><given-names>S. U.</given-names></name><name><surname>Murtazae</surname><given-names>G.</given-names></name><name><surname>Murtaza</surname><given-names>G.</given-names></name></person-group><year>2021</year><article-title>Named Entity Recognition (NER) in NLP Techniques, Tools Accuracy and Performance</article-title><source>Pakistan Journal of Multidisciplinary Research</source><volume>2</volume><issue>2</issue><fpage>293</fpage><lpage>308</lpage><ext-link ext-link-type="uri" xlink:href="https://pjmr.org/pjmr/article/view/150">https://pjmr.org/pjmr/article/view/150</ext-link></element-citation></ref>
<ref id="R15"><element-citation publication-type="other"><person-group person-group-type="author"><collab>Natural Language Toolkit</collab></person-group><comment>n.d.</comment><source>NLTK 3.0 documentation</source><ext-link ext-link-type="uri" xlink:href="https://www.nltk.org/index.html">https://www.nltk.org/index.html</ext-link></element-citation></ref>
<ref id="R16"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Salminen</surname><given-names>J.</given-names></name><name><surname>Hopf</surname><given-names>M.</given-names></name><name><surname>Chowdhury</surname><given-names>S. A.</given-names></name><name><surname>Jung</surname><given-names>S. G.</given-names></name><name><surname>Almerekhi</surname><given-names>H.</given-names></name><name><surname>Jansen</surname><given-names>B. J.</given-names></name></person-group><year>2020</year><article-title>Developing an online hate classifier for multiple social media platforms</article-title><source>Human-centric Computing and Information Sciences</source><volume>10</volume><fpage>1</fpage><lpage>34</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1186/s13673-019-0205-6">https://doi.org/10.1186/s13673-019-0205-6</ext-link></element-citation></ref>
<ref id="R17"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Sanh</surname><given-names>V.</given-names></name></person-group><year>2019</year><article-title>DistilBERT, A Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter</article-title><source>arXiv preprint arXiv:1910.01108</source><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.48550/arXiv.1910.01108">https://doi.org/10.48550/arXiv.1910.01108</ext-link></element-citation></ref>
<ref id="R18"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Vidgen</surname><given-names>B.</given-names></name><name><surname>Nguyen</surname><given-names>D.</given-names></name><name><surname>Margetts</surname><given-names>H.</given-names></name><name><surname>Rossini</surname><given-names>P.</given-names></name><name><surname>Tromble</surname><given-names>R.</given-names></name></person-group><year>2021</year><article-title>Introducing CAD: the contextual abuse dataset</article-title><source>Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source><fpage>2289</fpage><lpage>2303</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.18653/v1/2021.naacl-main.182">https://doi.org/10.18653/v1/2021.naacl-main.182</ext-link></element-citation></ref>
<ref id="R19"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Viera</surname><given-names>A. J.</given-names></name><name><surname>Garrett</surname><given-names>J. M.</given-names></name></person-group><year>2005</year><article-title>Understanding interobserver agreement: the kappa statistic</article-title><source>Fam med</source><volume>37</volume><issue>5</issue><fpage>360</fpage><lpage>363</lpage><ext-link ext-link-type="uri" xlink:href="https://api.semanticscholar.org/CorpusID:38150955">https://api.semanticscholar.org/CorpusID:38150955</ext-link></element-citation></ref>
<ref id="R20"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Yang</surname><given-names>Z.</given-names></name></person-group><year>2019</year><article-title>XLNet: Generalized Autoregressive Pretraining for Language Understanding</article-title><source>arXiv preprint arXiv:1906.08237</source><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.48550/arXiv.1906.08237">https://doi.org/10.48550/arXiv.1906.08237</ext-link></element-citation></ref>
<ref id="R21"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Yu</surname><given-names>X.</given-names></name><name><surname>Blanco</surname><given-names>E.</given-names></name><name><surname>Hong</surname><given-names>L.</given-names></name></person-group><year>2022</year><article-title>Hate Speech and Counter Speech Detection: Conversational Context Does Matter</article-title><source>Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source><fpage>5918</fpage><lpage>5930</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.48550/arXiv.2206.06423">https://doi.org/10.48550/arXiv.2206.06423</ext-link></element-citation></ref>
<ref id="R22"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Zhang</surname><given-names>A. Q.</given-names></name><name><surname>Montague</surname><given-names>K.</given-names></name><name><surname>Jhaver</surname><given-names>S.</given-names></name></person-group><year>2023</year><article-title>Cleaning Up the Streets: Understanding Motivations, Mental Models, and Concerns of Users Flagging Social Media Posts</article-title><source>arXiv preprint arXiv:2309.06688</source><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.48550/arXiv.2309.06688">https://doi.org/10.48550/arXiv.2309.06688</ext-link></element-citation></ref>
</ref-list>
</back>
</article>