<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "http://jats.nlm.nih.gov/publishing/1.0/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" article-type="research-article" xml:lang="en">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">IR</journal-id>
<journal-title-group>
<journal-title>Information Research</journal-title>
</journal-title-group>
<issn pub-type="epub">1368-1613</issn>
<publisher>
<publisher-name>University of Bor&#x00E5;s</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">ir30iConf47572</article-id>
<article-id pub-id-type="doi">10.47989/ir30iConf47572</article-id>
<article-categories>
<subj-group xml:lang="en">
<subject>Research article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Finding Pareto trade-offs in fair and accurate detection of toxic speech</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author"><name><surname>Gupta</surname><given-names>Soumyajit</given-names></name>
<xref ref-type="aff" rid="aff0001"/></contrib>
<contrib contrib-type="author"><name><surname>Kovatchev</surname><given-names>Venelin</given-names></name>
<xref ref-type="aff" rid="aff0002"/></contrib>
<contrib contrib-type="author"><name><surname>Das</surname><given-names>Anubrata</given-names></name>
<xref ref-type="aff" rid="aff0003"/></contrib>
<contrib contrib-type="author"><name><surname>De-Arteaga</surname><given-names>Maria</given-names></name>
<xref ref-type="aff" rid="aff0004"/></contrib>
<contrib contrib-type="author"><name><surname>Lease</surname><given-names>Matthew</given-names></name>
<xref ref-type="aff" rid="aff0005"/></contrib>
<aff id="aff0001"><bold>Soumyajit Gupta</bold> is a graduate student in Computer Science, University of Texas at Austin, USA. He received his Ph.D. from UT Austin and his research interests are in Machine Learning and Interpretable Neural Design. He can be contacted at <email xlink:href="smjtgupta@utexas.edu">smjtgupta@utexas.edu</email></aff>
<aff id="aff0002"><bold>Venelin Kovatchev</bold> is an Associate Professor in Computer Science, University of Birmingham, UK. He did his Post Doctorate from UT Austin and his research interests are in Computational Linguistics and Natural Language Processing. He can be contacted at <email xlink:href="v.o.kovatchev@bham.ac.uk">v.o.kovatchev@bham.ac.uk</email></aff>
<aff id="aff0003"><bold>Anubrata Das</bold> is a graduate student in School of Information, University of Texas at Austin, USA. He is under PhD candidacy at UT Austin and his research interests are in Human Computer Interactions and Natural Language Processing. He can be contacted at <email xlink:href="anubrata.das@utexas.edu">anubrata.das@utexas.edu</email></aff>
<aff id="aff0004"><bold>Maria De-Arteaga</bold> is an Assistant Professor in McCombs School of Business, University of Texas at Austin, USA. She is a Good Systems researcher. Her research interests are in Algorithmic Fairness and Human-AI complementarity. She can be contacted at <email xlink:href="dearteaga@mccombs.utexas.edu">dearteaga@mccombs.utexas.edu</email></aff>
<aff id="aff0005"><bold>Matthew Lease</bold> is a Professor in School of Information, University of Texas at Austin, USA. He is a Good Systems researcher and co-director of UT&#x2019;s CosmicAI Project. His research interests are in Artificial Intelligence and Human-Computer Interaction. He can be contacted at <email xlink:href="ml@utexas.edu">ml@utexas.edu</email></aff>
</contrib-group>
<pub-date pub-type="epub"><day>06</day><month>05</month><year>2025</year></pub-date>
<pub-date pub-type="collection"><year>2025</year></pub-date>
<volume>30</volume>
<issue>i</issue>
<fpage>123</fpage>
<lpage>141</lpage>
<permissions>
<copyright-year>2025</copyright-year>
<copyright-holder>&#x00A9; 2025 The Author(s).</copyright-holder>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by-nc/4.0/">
<license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by-nc/4.0/">http://creativecommons.org/licenses/by-nc/4.0/</ext-link>), permitting all non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract xml:lang="en">
<title>Abstract</title>
<p><bold>Introduction.</bold> Optimizing NLP models for fairness poses many challenges. Lack of differentiable fairness measures prevents gradient-based loss training or requires surrogate losses that diverge from the true metric of interest. In addition, competing objectives (e.g., accuracy vs. fairness) often require making trade-offs based on stakeholder preferences, but stakeholders may not know their preferences before seeing system performance under different trade-off settings.</p>
<p><bold>Method.</bold> We formulate the GAP loss, a differentiable version of a fairness measure, Accuracy Parity, to provide balanced accuracy across binary demographic groups.</p>
<p><bold>Analysis.</bold> We show how model-agnostic, <italic>HyperNetwork</italic> optimization can efficiently train arbitrary NLP model architectures to learn <italic>Pareto-optimal</italic> trade-offs between competing metrics like predictive performance vs. group fairness.</p>
<p><bold>Results.</bold> Focusing on the task of toxic language detection, we show the generality and efficacy of our proposed GAP loss function across two datasets, three neural architectures, and three fairness loss functions.</p>
<p><bold>Conclusion.</bold> Our GAP loss for the task of TL detection demonstrates promising results - improved fairness and computational efficiency. Our work can be extended to other tasks, datasets, and neural models in any practical situation where ensuring equal accuracy across different demographic groups is a desired objective.</p>
</abstract>
</article-meta>
</front>
<body>
<sec id="sec1">
<title>Introduction</title>
<p>Toxic language in social media is often associated with various risks and harms: cyber bullying, discrimination, mental health, and even hate crimes. Given the massive volume of user-generated content online, manual review of all posts by human moderators simply does not scale. Consequently, natural language processing (NLP) methods have been developed to fully or partially automate toxicity detection (<xref rid="R55" ref-type="bibr">Schmidt &#x0026; Wiegand, 2017</xref>). Prior work has achieved high Accuracy and F1 scores on toxicity detection (e.g., (<xref rid="R66" ref-type="bibr">Zampieri et al., 2020</xref>)) across various model architectures: e.g., convolutional (CNN) (<xref rid="R27" ref-type="bibr">Gamb&#x00E4;ck &#x0026; Sikdar, 2017</xref>), sequential (BiLSTM) (<xref rid="R28" ref-type="bibr">Graves et al., 2005</xref>), and transformer (BERT) (<xref rid="R19" ref-type="bibr">Devlin et al., 2018</xref>). However, studies have also found that model accuracy can vary greatly across sensitive demographic attributes, such as race or gender (<xref rid="R16" ref-type="bibr">Das et al., 2021</xref>; <xref rid="R48" ref-type="bibr">Park et al., 2018</xref>; <xref rid="R54" ref-type="bibr">Sap et al., 2019</xref>). Subjective annotation in such tasks arise from personal biases and experiences of annotators. Traditional approaches relying on majority voting to resolve disagreements leads to oversimplification of the task. For example, a BERT-based classifier obtains 90.4% vs. 84.5% accuracy for White vs. African American author on Davidson&#x2019;s dataset (<xref rid="R18" ref-type="bibr">Davidson et al., 2017</xref>) when just optimized for overall accuracy, independent of author groups. Thus, in subjective domains, the minority viewpoint plays an important role (<xref rid="R53" ref-type="bibr">Sang &#x0026; Stanton, 2022</xref>), where context and interpretations around data collections (<xref rid="R51" ref-type="bibr">Rahman et al., 2022</xref>), sources (<xref rid="R12" ref-type="bibr">Chaudhry &#x0026; Lease, 2022</xref>) and targets, can heavily influence judgments.</p>
<p>While recent years have seen rapid progress in fairness research, it is often measured in a post hoc manner, and optimization is often indirect (<italic>e.g.,</italic> by improving training data through pre- or post-processing) (<xref rid="R54" ref-type="bibr">Sap et al., 2019</xref>). A particular challenge is that most existing measures are non- differentiable and thus cannot be optimized directly via gradient descent. While one can optimize differentiable surrogate loss functions instead, this risks <italic>metric divergence</italic> between the optimization criteria used in training <italic>vs.</italic> the actual metrics of interest (<xref rid="R43" ref-type="bibr">Metzler &#x0026; Croft, 2007</xref>; <xref rid="R45" ref-type="bibr">Morgan et al., 2004</xref>; <xref rid="R60" ref-type="bibr">Swezey et al., 2021</xref>; <xref rid="R65" ref-type="bibr">Yue et al., 2007</xref>).</p>
<p>As (<xref rid="R25" ref-type="bibr">Friedler et al., 2021</xref>) and others have noted, different worldviews lead to conflicting definitions of fairness that are mutually incompatible and specific fairness measures must be selected (suitable to the given task, context, and stakeholders at hand). In this work, we adopt a popular fairness objective, Accuracy Parity (<xref rid="R67" ref-type="bibr">Zhao et al., 2020</xref>) to optimize a model to provide balanced accuracy across demographic groups (<xref rid="R7" ref-type="bibr">Berk et al., 2021</xref>; <xref rid="R16" ref-type="bibr">Das et al., 2021</xref>; <xref rid="R32" ref-type="bibr">Heidari et al., 2019</xref>; <xref rid="R44" ref-type="bibr">Mitchell et al., 2021</xref>). Because no differentiable version of this measure exists, we formulate a novel, differentiable version, <italic>Group Accuracy Parity</italic> (GAP) that can be directly used to optimize descent-based models. We provide both a theoretical derivation and an empirical justification for GAP.</p>
<p>However, optimizing GAP alone may reduce Overall Accuracy (OA) since seeking to better fit minority group may lead to worse fit of majority group that tends to drive OA. Ultimately, we face a trade-off between competing objectives, whether we balance between competing accuracy goals (<italic>e.g.,</italic> precision <italic>vs.</italic> recall), fairness goals, or any combination thereof. Multi-Objective Optimization (MOO) provides a principled framework and rigorous toolbox for approaching such competing trade-offs, instead of treating them as single objective regularization problems (<xref rid="R39" ref-type="bibr">Little, 2023</xref>; <xref rid="R57" ref-type="bibr">Sorensen et al., 2024</xref>; <xref rid="R58" ref-type="bibr">Soto et al., 2022</xref>; <xref rid="R59" ref-type="bibr">Suau et al., 2024</xref>). We believe such MOO work remains underexplored in NLP today, and to the best of our knowledge, ours is the first NLP work on MOO for fair toxic language detection.</p>
<p>Because competing objectives typically lack global optima, optimization requires choosing among a set of equally valid, <italic>Pareto-optimal</italic> trade-offs between objectives. Naturally, selection of a suitable trade-off depends on stakeholder needs, and they typically wish to see system performance under real trade-off conditions before having to commit to any particular trade-off. We demonstrate how the full Pareto manifold &#x2013; for <italic>any</italic> underlying model architecture &#x2013; can be efficiently induced, provided optimization can be performed via gradient descent (with differentiable loss objectives). This is accomplished via recent advances in <italic>Pareto front learning</italic> (PFL) (<xref rid="R29" ref-type="bibr">Gupta et al., 2022</xref>; X. <xref rid="R37" ref-type="bibr">Lin et al., 2020</xref>; <xref rid="R47" ref-type="bibr">Navon et al., 2021</xref>) for <italic>HyperNetworks</italic> (<xref rid="R30" ref-type="bibr">Ha et al., 2017</xref>), which train one neural model to generate effective weights for a second, target model.</p>
<p>In summary, we pursue two distinct and complementary approaches for fair toxic language detection via model optimization. First, recognizing the repeated call for balancing accuracy across demographic groups, yet finding no differentiable metric doing so, we present the first differentiable version, GAP, enabling optimization for the first time via standard gradient descent. Our results show a clear benefit of optimizing directly for the target metric of interest rather than surrogate loss functions that diverge from it. Second, to demonstrate generality of PFL optimization over competing objectives, we induce the full Pareto front of optimal trade-offs between OA <italic>vs.</italic> three different fairness measures: GAP and two prior measures. To show generality of both techniques &#x2013; single-objective GAP and multiobjective PFL &#x2013; we show optimization over three distinct neural architectures (CNN, BiLSTM, and BERT) on two datasets: Davidson (<xref rid="R18" ref-type="bibr">Davidson et al., 2017</xref>) and Wilds (<xref rid="R34" ref-type="bibr">Koh et al., 2021</xref>).</p>
<p>Our results show that GAP better balances accuracy across demographic groups (authors and targets of potentially toxic tweets) than existing differentiable measures. With multi-objective PFL, we show that we can successfully induce the full manifold of Pareto-optimal trade-offs across all differentiable objectives and neural architectures considered. GAP also achieves the best empirical trade-offs for OA vs. balanced accuracy in comparison to the two other fairness metrics considered. Finally, we note that GAP and PFL are broadly applicable and can be adapted for a wide range of NLP tasks, beyond the task of toxicity detection. For reproducibility and adoption, we provide our GAP source code.</p>
</sec>
<sec id="sec2">
<title>Related work</title>
<sec id="sec2_1">
<title>Toxic language detection and fairness</title>
<p>Many datasets now exist to train and test automated systems for TL detection (<xref rid="R50" ref-type="bibr">Poletto et al., 2021</xref>; <xref rid="R62" ref-type="bibr">Vidgen &#x0026; Derczynski, 2020</xref>). Many NLP models have been proposed and continue to increase overall accuracy of detection (<xref rid="R23" ref-type="bibr">Fortuna &#x0026; Nunes, 2018</xref>; <xref rid="R41" ref-type="bibr">MacAvaney et al., 2019</xref>; <xref rid="R55" ref-type="bibr">Schmidt &#x0026; Wiegand, 2017</xref>). However, recent studies highlight the racial bias induced in such classification tasks. Davidson <italic>et al</italic>. (<xref rid="R18" ref-type="bibr">Davidson et al., 2017</xref>) introduced a dataset with a corpus of tweets collected from social media and human annotations on the toxicity of the tweet. Sap <italic>et al</italic>. (<xref rid="R54" ref-type="bibr">Sap et al., 2019</xref>) and Davidson <italic>et al</italic>. (<xref rid="R17" ref-type="bibr">Davidson et al., 2019</xref>) analyse the correlation between race and gold-label of toxicity in the (<xref rid="R18" ref-type="bibr">Davidson et al., 2017</xref>) dataset and find a strong association between AAE markers and toxicity annotation, where both works noisily infer author dialect via Blodgett <italic>et al</italic>. (<xref rid="R8" ref-type="bibr">Blodgett et al., 2017</xref>)&#x2019;s model as a proxy for race. The Wilds (<xref rid="R34" ref-type="bibr">Koh et al., 2021</xref>) dataset contains targets of TL with different demographic information and human annotated majority voted labels. It provides predefined training/test splits for effectively measuring distribution shifts in TL models.</p>
<p>To address the problem of bias in automatic TL detection, some work has been done on improving the training and testing data (<xref rid="R48" ref-type="bibr">Park et al., 2018</xref>; <xref rid="R52" ref-type="bibr">R&#x00F6;ttger et al., 2021</xref>; <xref rid="R54" ref-type="bibr">Sap et al., 2019</xref>), with the expectation that fairer data will lead to fairer learned models. Some of the most similar to our work, by Xia <italic>et al</italic>. (<xref rid="R64" ref-type="bibr">Xia et al., 2020</xref>), Burack <italic>et al</italic>. (<xref rid="R5" ref-type="bibr">Ball-Burack et al., 2021</xref>), and Shen <italic>et al</italic>. (<xref rid="R56" ref-type="bibr">Shen et al., 2022</xref>), seeks to reduce the bias towards AAE-authors in the algorithm rather than data.</p>
<sec id="sec2_1_1">
<title>Fairness measures</title>
<p>The amplification of systemic unfairness through AI applications has been pronounced across different critical application areas such as hiring, finance, legal applications, content moderation <italic>etc.</italic> (<xref rid="R1" ref-type="bibr">Angwin et al., 2016</xref>; <xref rid="R3" ref-type="bibr">Balashankar &#x0026; Lees, 2022</xref>). It is of societal and ethical importance to examine if an AI is discriminative and develop methods to make the AI fair on grounds such as gender, ethnicity, or other forms of identity attributes (<xref rid="R21" ref-type="bibr">Ekstrand et al., 2022</xref>). To connect fairness concepts with statistical measures in machine learning, Mitchell <italic>et al</italic>. (<xref rid="R44" ref-type="bibr">Mitchell et al., 2021</xref>) synthesizes fairness measures based on the confusion matrix. Friedler <italic>et al</italic>. (<xref rid="R26" ref-type="bibr">Friedler et al., 2019</xref>) further categorize fairness measures into largely three categories: 1) measures based on base rates, such as Disparate Impact (<xref rid="R22" ref-type="bibr">Feldman et al., 2015</xref>), 2) measures based on group-conditioned accuracy, and 3) measures based on group-conditioned calibration.</p>
</sec>
<sec id="sec2_1_2">
<title>Pareto optimization of trade-offs</title>
<p>Multi-Objective Optimization (MOO) is increasingly pursued in fair classification (<xref rid="R11" ref-type="bibr">Caton &#x0026; Haas, 2020</xref>). The complexity of real-world problem often leads to competing objectives such as accuracy vs. fairness. Pareto frameworks are powerful tools to balance between such competing objectives. Several works (<xref rid="R4" ref-type="bibr">Balashankar et al., 2019</xref>; <xref rid="R42" ref-type="bibr">Martinez et al., 2020</xref>) seek to balance accuracy vs. fairness. Valdivia et al. (<xref rid="R61" ref-type="bibr">Valdivia et al., 2020</xref>) presents a group-fairness based trade-off model for decision tree classifiers via a genetic algorithm. Wei et al. (<xref rid="R63" ref-type="bibr">Wei &#x0026; Niethammer, 2020</xref>) uses Chebyshev scalarization to provide a neural architecture for fairness vs. accuracy Pareto front computation in classification. Lin et al. (X. <xref rid="R38" ref-type="bibr">Lin et al., 2019</xref>) claims Pareto optimality on the basis of KKT conditions. In this work, we adopt Gupta et al. (<xref rid="R29" ref-type="bibr">Gupta et al., 2022</xref>)&#x2019;s SUHNPF framework, given its error tolerance bounds and strong empirical performance. We apply it as a HyperNetwork (<xref rid="R30" ref-type="bibr">Ha et al., 2017</xref>) to optimize a variety of neural network models for TL detection. While we only optimize the Pareto trade-off between a single accuracy measure vs. a single fairness measure, the framework itself is more general and directly supports optimizing arbitrary numbers of competing objectives (and constraints).</p>
</sec>
<sec id="sec2_1_3">
<title>Group accuracy parity (GAP)</title>
<p>In this work, we focus on <italic>accuracy parity</italic> (AP) (<xref rid="R67" ref-type="bibr">Zhao et al., 2020</xref>), <italic>i.e.,</italic> balancing accuracy across groups (subpopulations based on some demographic criteria), sometimes known as <italic>equal accuracy</italic> (<xref rid="R44" ref-type="bibr">Mitchell et al., 2021</xref>), <italic>equality of accuracy</italic> (<xref rid="R32" ref-type="bibr">Heidari et al., 2019</xref>), <italic>overall accuracy equality</italic> (<xref rid="R7" ref-type="bibr">Berk et al., 2021</xref>), <italic>accuracy equity</italic> (<xref rid="R20" ref-type="bibr">Dieterich et al., 2016</xref>), or <italic>accuracy difference</italic> (<xref rid="R16" ref-type="bibr">Das et al., 2021</xref>). We do not claim any primacy of this particular notion of fairness, but show that if one is interested in it, it can be directly optimized via our Group Accuracy Parity (GAP) measure without <italic>metric divergence</italic> (<xref rid="R43" ref-type="bibr">Metzler &#x0026; Croft, 2007</xref>; <xref rid="R45" ref-type="bibr">Morgan et al., 2004</xref>; <xref rid="R60" ref-type="bibr">Swezey et al., 2021</xref>; <xref rid="R65" ref-type="bibr">Yue et al., 2007</xref>) between loss function <italic>vs.</italic> evaluation metric.</p>
</sec>
<sec id="sec2_1_4">
<title>Accuracy difference</title>
<p>While AP is an equality condition, we still need to quantify the deviation from equality in cases of unequal performance across groups. We therefore use Accuracy difference (AD) (<xref rid="R16" ref-type="bibr">Das et al., 2021</xref>), a continuous version of AP to measure this deviation. AD is shown in (Eq. 1), where &#x0177;,y,g are the predicted label, true label, and group attribute respectively.</p>
<disp-formula><label>(1)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="block"><mml:mrow><mml:mi>A</mml:mi><mml:mi>D</mml:mi><mml:mo>=</mml:mo><mml:munder><mml:munder><mml:mrow><mml:mi>P</mml:mi><mml:mfenced close="]" open="["><mml:mrow><mml:mover accent='true'><mml:mi>y</mml:mi><mml:mo>&#x005E;</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mi>y</mml:mi><mml:mo>&#x007C;</mml:mo><mml:mi>g</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mfenced></mml:mrow><mml:mo stretchy='true'>&#xFE38;</mml:mo></mml:munder><mml:mrow><mml:mtext>Acc</mml:mtext><mml:mtext>&#x2009;</mml:mtext><mml:mtext>Group</mml:mtext><mml:mtext>&#x2009;</mml:mtext><mml:mtext>1</mml:mtext><mml:mfenced><mml:mrow><mml:mtext>g=1</mml:mtext></mml:mrow></mml:mfenced></mml:mrow></mml:munder><mml:mo>&#x2212;</mml:mo><mml:munder><mml:munder><mml:mrow><mml:mi>P</mml:mi><mml:mfenced close="]" open="["><mml:mrow><mml:mover accent='true'><mml:mi>y</mml:mi><mml:mo>&#x005E;</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mi>y</mml:mi><mml:mo>&#x007C;</mml:mo><mml:mi>g</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:mfenced></mml:mrow><mml:mo stretchy='true'>&#xFE38;</mml:mo></mml:munder><mml:mrow><mml:mtext>Acc</mml:mtext><mml:mtext>&#x2009;</mml:mtext><mml:mtext>Group</mml:mtext><mml:mtext>&#x2009;</mml:mtext><mml:mtext>0</mml:mtext><mml:mfenced><mml:mrow><mml:mtext>g=0</mml:mtext></mml:mrow></mml:mfenced></mml:mrow></mml:munder></mml:mrow></mml:math></disp-formula>
<p>AD being defined based on the confusion matrix, makes the formulation is probabilistic in nature, <italic>i.e.,</italic> ratio of numbers over the dataset, and not distribution over variable, AD becomes non- differentiable. Thus, AD can only be used in a post-hoc manner and cannot be directly used for gradient-based back propagation. Furthermore, Eq. 1 inherently assumes that the majority group accuracy (<italic>g</italic> = 1) will always be higher than the vulnerable group (<italic>g</italic> = 0), which might not always hold true, resulting in potential negative values of AD in the range [-1,1]. Naturally, as a post-hoc measure, AD is disconnected from the optimization objective of the model used during training. These limitations motivated us to define a differentiable, non-probabilistic form of AD we refer to as Group Accuracy Parity (GAP), which allows any descent-based model during training to optimize close to equal accuracy across sensitive attribute classes, and addresses the range issue of AD.</p>
</sec>
<sec id="sec2_1_5">
<title>Formulation</title>
<p>Binary Cross Entropy (BCE), as formulated in Eq. 2 is typically used as a loss function for optimizing a classifier. Although not a strict one-to-one correspondence, it is observed that minimizing BCE leads to maximization of Accuracy.</p>
<disp-formula><label>(2)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="block"><mml:mrow><mml:mi>B</mml:mi><mml:mi>C</mml:mi><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>N</mml:mi></mml:mfrac><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mi>N</mml:mi></mml:munder><mml:mrow><mml:mi>y</mml:mi><mml:mtext>&#x2009;</mml:mtext><mml:mi>log</mml:mi><mml:mfenced><mml:mover accent='true'><mml:mi>y</mml:mi><mml:mo>&#x005E;</mml:mo></mml:mover></mml:mfenced><mml:mo>+</mml:mo><mml:mfenced><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:mfenced><mml:mi>log</mml:mi><mml:mfenced><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mover accent='true'><mml:mi>y</mml:mi><mml:mo>&#x005E;</mml:mo></mml:mover></mml:mrow></mml:mfenced></mml:mrow></mml:mstyle></mml:mrow></mml:math></disp-formula>
<p>Weighted Cross Entropy (WCE) is a variant of BCE that re-weights the error for the different classes proportional to their inverse frequency of labels in the data. The class re-weighting strategy is available in packages like SkLearn (<xref rid="R49" ref-type="bibr">Pedregosa et al., 2011</xref>) and is discussed in detail by Lin <italic>et al</italic>. (T.-Y. <xref rid="R36" ref-type="bibr">Lin et al., 2017</xref>). For balanced classification across sensitive attributes (<italic>e.g.,</italic> demographic information across author groups or gender information across targets in Hate Speech), we formulate our GAP loss function as follows: we first calculate the WCE for each sensitive attribute (<italic>g</italic>), then minimize the difference across them. The GAP loss function in Eq. 3 is minimized only when WCE errors match across the binary sensitive attribute.</p>
<disp-formula><label>(3)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="block"><mml:mrow><mml:mi>G</mml:mi><mml:mi>A</mml:mi><mml:mi>P</mml:mi><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mfenced close="&#x2016;" open="&#x2016;"><mml:mrow><mml:munder><mml:munder><mml:mrow><mml:mi>W</mml:mi><mml:mi>C</mml:mi><mml:mi>E</mml:mi><mml:mfenced><mml:mrow><mml:mi>g</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mfenced></mml:mrow><mml:mo stretchy='true'>&#xFE38;</mml:mo></mml:munder><mml:mrow><mml:mtext>Acc</mml:mtext><mml:mtext>&#x2009;</mml:mtext><mml:mtext>Group</mml:mtext><mml:mtext>&#x2009;</mml:mtext><mml:mtext>1</mml:mtext><mml:mfenced><mml:mrow><mml:mtext>g=1</mml:mtext></mml:mrow></mml:mfenced></mml:mrow></mml:munder><mml:mo>&#x2212;</mml:mo><mml:munder><mml:munder><mml:mrow><mml:mi>W</mml:mi><mml:mi>C</mml:mi><mml:mi>E</mml:mi><mml:mfenced><mml:mrow><mml:mi>g</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:mfenced></mml:mrow><mml:mo stretchy='true'>&#xFE38;</mml:mo></mml:munder><mml:mrow><mml:mtext>Acc</mml:mtext><mml:mtext>&#x2009;</mml:mtext><mml:mtext>Group</mml:mtext><mml:mtext>&#x2009;</mml:mtext><mml:mtext>0</mml:mtext><mml:mfenced><mml:mrow><mml:mtext>g=0</mml:mtext></mml:mrow></mml:mfenced></mml:mrow></mml:munder></mml:mrow></mml:mfenced></mml:mrow><mml:mn>2</mml:mn><mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:math></disp-formula>
<p>The GAP function has the following properties:</p>
<list list-type="order">
<list-item><p>GAP maps to AD. GAP has a one-to-one correspondence to AD <italic>i.e.,</italic> minimizing GAP also minimizes AD.</p></list-item>
<list-item><p>GAP is differentiable. GAP is defined as the squared 2-norm difference between the Weighted Cross Entropy (WCE) across the two sensitive attribute. Since WCE is differentiable, so is the 2-norm difference. Hence GAP can optimize any descent-based model.</p></list-item>
<list-item><p>GAP is symmetric. GAP has a 2-norm formulation, ensuring the range of attainable values are within <italic>GAP</italic> &#x2208; [0<italic>,</italic>1], avoiding the negativity issue faced in AD. Also being a 2-norm measure, the loss surface of GAP is smoother than other comparable measures like CLA (<xref rid="R56" ref-type="bibr">Shen et al., 2022</xref>), which uses 1-norm (<xref rid="R10" ref-type="bibr">Boyd et al., 2004</xref>).</p></list-item>
</list>
<p>For a step-by-step derivation from WCE to GAP, readers are referred to Appendix A, showing the strict correspondence between the loss measures. In this paper we implement GAP (Eq. 3) to correspond to AD (Eq. 1). As such, GAP can be optimized over binary labels and binary groups.</p>
</sec>
</sec>
</sec>
<sec id="sec3">
<title>Optimizing competing objectives</title>
<p>Typically, toxicity detection systems are trained with the single objective of maximizing OA (<xref rid="R24" ref-type="bibr">Founta et al., 2018</xref>; <xref rid="R48" ref-type="bibr">Park et al., 2018</xref>; <xref rid="R52" ref-type="bibr">R&#x00F6;ttger et al., 2021</xref>) or a custom defined objective (<xref rid="R64" ref-type="bibr">Xia et al., 2020</xref>). In contrast, we frame toxicity detection as a Multi-Objective Optimization (MOO) problem. It is important to highlight the distinction between an M(Multi)OO vs. S(Single)OO formulation and their interpretation. Consider the two objectives as f1: Cross-Entropy and f2: Fairness. Traditional fair classifiers operate by adding a penalty term corresponding to Fairness to the main objective Entropy with a hyper-parameter &#x03BB; in Eq. 4.</p>
<disp-formula><label>(4)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="block"><mml:mrow><mml:mtable columnalign='left' equalrows='true' equalcolumns='true'><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mi>min</mml:mi></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:mi>&#x03BB;</mml:mi><mml:msub><mml:mi>f</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mi>min</mml:mi></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mtext>Cross-Entropy</mml:mtext><mml:mtext>&#x2009;</mml:mtext><mml:mtext>loss</mml:mtext><mml:mo>+</mml:mo><mml:mtext>Fairness</mml:mtext><mml:mtext>&#x2009;</mml:mtext><mml:mtext>loss</mml:mtext></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:math></disp-formula>
<p>The reader is specifically requested to note that such optimization process does not have any control over the range of <italic>&#x03BB;</italic>, and it can vary generally between (0<italic>,</italic>&#x221E;). During the optimization process, we tune <italic>&#x03BB;</italic> till we get a desired performance in SOO setting. Furthermore, there is no explicit requirement of the scale of <italic>f</italic>1 and <italic>f</italic>2 to be the same. Thus, there is no simple correlation between the amount of Fairness we want <italic>vs.</italic> the value of <italic>&#x03BB;</italic>.</p>
<p>An unconstrained MOO problem with two competing loss objectives is defined in Eq. 5. Note that this is a joint min-min problem instead of a single min problem. The objectives here need to be at the same scale <italic>w.r.t.</italic> each other. If the expectation is to achieve a liner trade-off between them, the linear scalarized form of the MOO problem with trade-off <italic>&#x03B1;</italic> &#x2208; [0<italic>,</italic>1], minimizes both objectives simultaneously in Eq. 6. Solving this reformulated MOO problem would achieve balance between Entropy and Fairness, with <italic>&#x03B1;</italic> holding strict mathematical interpretation of linear trade-off. Decreasing Entropy causes Fairness to increase, while decreasing Fairness causes Entropy to increase.</p>
<disp-formula><label>(5)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="block"><mml:mrow><mml:mtable columnalign='left' equalrows='true' equalcolumns='true'><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mi>min</mml:mi></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mi>min</mml:mi></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mi>&#x03B1;</mml:mi><mml:msub><mml:mi>f</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:mfenced><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03B1;</mml:mi></mml:mrow></mml:mfenced><mml:msub><mml:mi>f</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:math></disp-formula>
<disp-formula><label>(6)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="block"><mml:mrow><mml:mtable equalrows='true' equalcolumns='true'><mml:mtr><mml:mtd><mml:mrow><mml:mi>min</mml:mi></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mi>&#x03B1;</mml:mi><mml:mtext>&#x2009;</mml:mtext><mml:mtext>Cross-Entropy</mml:mtext><mml:mtext>&#x2009;</mml:mtext><mml:mtext>loss</mml:mtext><mml:mo>+</mml:mo><mml:mfenced><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03B1;</mml:mi></mml:mrow></mml:mfenced><mml:mtext>&#x2009;</mml:mtext><mml:mtext>Fairness</mml:mtext><mml:mtext>&#x2009;</mml:mtext><mml:mtext>loss</mml:mtext></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:math></disp-formula>
<p>Note that there are multiple mathematically optimal solutions to Eq. 6. Every optimal solution corresponding to each value of <italic>&#x03B1;</italic> in Eq. 6 is a member of the Pareto optimal solution set <italic>i.e.,</italic> the Pareto front contains the set of optimal model parameters given the dataset and the model. To solve this MOO problem, we adopt the SUHNPF Pareto framework (<xref rid="R29" ref-type="bibr">Gupta et al., 2022</xref>) as a HyperNetwork (<xref rid="R30" ref-type="bibr">Ha et al., 2017</xref>) to learn optimal TL detection neural model parameters over trade-offs. Hypernetworks train one neural model to generate effective weights for a second, target model.</p>
<p>SUHNPF efficiently learns the entire Pareto manifold of feasible trade-off values during training. This empowers users to then choose any solution point they prefer on the manifold, <italic>a posteriori</italic>, and extract the classifier weights configuration as per their desired trade-off <italic>&#x03B1;</italic>, without retraining the model for that <italic>&#x03B1;</italic>. Training the same model for <italic>K</italic> different <italic>&#x03B1;</italic>&#x2019;s, with <italic>R</italic> being the time for a single run, would result in total runtime of <italic>K</italic> &#x00D7; <italic>R i.e.,</italic> linear on the number of runs. Using the Hypernetwork to learn the manifold is computationally much more efficient <italic>i.e.,</italic> taking a constant time <italic>c</italic> &#x00D7; <italic>R</italic>, 1 <italic>&#x003C; c</italic> &#x226A; <italic>K</italic> over feasible <italic>&#x03B1;</italic>&#x2019;s, rather than for each value of <italic>&#x03B1;</italic>. Refer to Appendix D for values on runs.</p>
</sec>
<sec id="sec4">
<title>Experimental details</title>
<p>In this section, we describe our datasets, neural models, baseline losses and other evaluation details.</p>
<sec id="sec4_1">
<title>Datasets</title>
<p>We consider two datasets: Davidson et al. (<xref rid="R18" ref-type="bibr">Davidson et al., 2017</xref>) for author demographics and the Civil Comments (<xref rid="R9" ref-type="bibr">Borkan et al., 2019</xref>) portion of Wilds (<xref rid="R34" ref-type="bibr">Koh et al., 2021</xref>) for target demographics (<xref ref-type="table" rid="T1">Table 1</xref>). In each case, we frame the task as a binary classification problem (Toxic vs. non-Toxic, or &#x201C;safe&#x201D;) with binary sensitive attributes (Majority vs. Minority, the under-represented, sensitive attribute). Note that &#x201C;Majority&#x201D; and &#x201C;Minority&#x201D; in our work simply refers to the statistical representation of the group in the data and does not carry any social or cultural meaning.</p>
<table-wrap id="T1">
<label>Table 1.</label>
<caption><p>Statistics of the two datasets used in this work. For Davidson et al. (<xref rid="R18" ref-type="bibr">Davidson et al., 2017</xref>), we consider the author demographics AAE vs. SAE as group attribute for minority vs. majority group. For Wilds (<xref rid="R34" ref-type="bibr">Koh et al., 2021</xref>), we consider the binary group target gender as male vs. female for minority vs. majority group attributes.</p></caption>
<table>
<thead>
<tr>
<th align="center" valign="top">Dataset</th>
<th align="center" valign="top">Group</th>
<th align="center" valign="top">Toxic</th>
<th align="center" valign="top">Safe</th>
<th align="center" valign="top">Total</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top" rowspan="2">Davidson</td>
<td align="center" valign="top">Minority</td>
<td align="center" valign="top">8,725</td>
<td align="center" valign="top">302</td>
<td align="center" valign="top">9,027 (36%)</td>
</tr>
<tr>
<td align="center" valign="top">Majority</td>
<td align="center" valign="top">11,895</td>
<td align="center" valign="top">3,861</td>
<td align="center" valign="top">15,756 (64%)</td>
</tr>
<tr>
<td align="center" valign="top" rowspan="2">Wilds</td>
<td align="center" valign="top">Minority</td>
<td align="center" valign="top">5,973</td>
<td align="center" valign="top">33,762</td>
<td align="center" valign="top">39,735 (44%)</td>
</tr>
<tr>
<td align="center" valign="top">Majority</td>
<td align="center" valign="top">6,832</td>
<td align="center" valign="top">42,950</td>
<td align="center" valign="top">49,782 (56%)</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Author Demographics Dataset We consider fair moderation of posts written by authors from different demographic groups in (<xref rid="R18" ref-type="bibr">Davidson et al., 2017</xref>). Prior studies (<xref rid="R2" ref-type="bibr">Arango et al., 2019</xref>; <xref rid="R54" ref-type="bibr">Sap et al., 2019</xref>) have empirically demonstrated the existence of bias towards author demographics in toxic language classification. The sensitive attribute in this dataset is race, as identified by the dialect of the tweets. Following prior work, we apply Blodgett et al. (<xref rid="R8" ref-type="bibr">Blodgett et al., 2017</xref>)&#x2019;s model to automatically-detect dialect labels for each of the tweet as African American English (AAE) or Standard American English (SAE), representing Minority and Majority groups, respectively. We acknowledge both that dialect is only a weak surrogate representation of demographic race, and that automatic detection of dialect will naturally incur noise. However, in this, we follow established practices from prior work. Our fairness methods are agnostic to the sensitive attribute labelled in the data, and our results are only intended to attest to the capabilities of our proposed methods, rather than provide findings regarding protection of any specific vulnerable population. Davidson et al. (<xref rid="R18" ref-type="bibr">Davidson et al., 2017</xref>)&#x2019;s data includes 24,783 Twitter posts labelled as Hate, Offensive, or Normal. Following prior work (<xref rid="R48" ref-type="bibr">Park et al., 2018</xref>), we set the class label to 1 (Toxic) if the post contains hate speech or offensive language, and 0 otherwise. We note that tweets from Minority authors are annotated as toxic in 96% of the cases, compared to 75% for the tweets by Majority authors. While these statistics suggest an important risk of annotation bias in this dataset, dataset debiasing lies beyond the scope of our work. Our focus in this work is restricted to balancing accuracy across the groups, given the dataset as it is annotated.</p>
<p>Target Identity Dataset To assess fair protection of different groups targeted in posts, we use the Civil Comments (<xref rid="R9" ref-type="bibr">Borkan et al., 2019</xref>) portion of Wilds (<xref rid="R34" ref-type="bibr">Koh et al., 2021</xref>). This dataset has 448,000 training tweets labelled as Toxic or non-Toxic. Each tweet has explicit annotation for the demographics, gender, or religion of the target entity. We select tweets where more than 50% of annotators agreed on the gender of the target. In this work, we include only female (majority) and male (minority) genders in order construct a binary sensitive attribute for our experiments. In doing so, we fully-acknowledge both the non-binary nature of gender and individual freedom of self-identification. As noted above, our methods are agnostic as to the sensitive attribute labelled in the data, and our inclusion of only two genders merely reflects a convenient way to assess the capabilities of our proposed methods in regards to balancing accuracy across a binary sensitive attribute.</p>
</sec>
<sec id="sec4_2">
<title>Neural models considered</title>
<p>To assess the generality of our methods across distinct neural architectures, we evaluate over three types of models: CNN (<xref rid="R27" ref-type="bibr">Gamb&#x00E4;ck &#x0026; Sikdar, 2017</xref>), BiLSTM (<xref rid="R28" ref-type="bibr">Graves et al., 2005</xref>) and BERT (<xref rid="R19" ref-type="bibr">Devlin et al., 2018</xref>). For full experimental setup, please refer to Appendix C. For all three models, we freeze the feature representation layers and optimize the weights of the classification layer. In general, GAP loss optimization and the SUHNPF hypernetwork (<xref rid="R29" ref-type="bibr">Gupta et al., 2022</xref>) support such generalization across any models that can be trained via gradient descent.</p>
</sec>
<sec id="sec4_3">
<title>Baseline loss functions</title>
<p>We compare against two baseline loss functions. The first fairness loss CLAss-wise equal opportunity (CLA) (<xref rid="R56" ref-type="bibr">Shen et al., 2022</xref>) seeks to balance False Negative Rate (FNR) across protected groups (<xref rid="R15" ref-type="bibr">Chouldechova, 2017</xref>), also known as equality of opportunity (<xref rid="R31" ref-type="bibr">Hardt et al., 2016</xref>). CLA minimizes the error in absolute differences between error <italic>w.r.t.</italic> a label (<italic>BCE</italic>(<italic>y</italic>)) and error <italic>w.r.t.</italic> a label given the sensitive attribute (<italic>BCE</italic> (<italic>y, g</italic>)), with hyperparameter <italic>&#x03BB;</italic> &#x2208; [0<italic>,</italic> &#x221E;], which differs from minimizing AD. Due to the 1 norm nature of CLA, the optimization surface for the loss function is not smooth (<xref rid="R10" ref-type="bibr">Boyd et al., 2004</xref>).</p>
<disp-formula><label>(7)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="block"><mml:mrow><mml:mi>C</mml:mi><mml:mi>L</mml:mi><mml:mi>A</mml:mi><mml:mo>=</mml:mo><mml:mi>B</mml:mi><mml:mi>C</mml:mi><mml:mi>E</mml:mi><mml:mo>+</mml:mo><mml:mi>&#x03BB;</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>y</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mi>C</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>g</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mi>G</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:mo>&#x007C;</mml:mo><mml:mi>B</mml:mi><mml:mi>C</mml:mi><mml:mi>E</mml:mi><mml:mfenced><mml:mrow><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>g</mml:mi></mml:mrow></mml:mfenced><mml:mo>&#x2212;</mml:mo><mml:mi>B</mml:mi><mml:mi>C</mml:mi><mml:mi>E</mml:mi><mml:mfenced><mml:mi>y</mml:mi></mml:mfenced></mml:mrow></mml:mstyle></mml:mrow></mml:mstyle><mml:mo>&#x007C;</mml:mo></mml:mrow></mml:math></disp-formula>
<p>The second fairness loss (<xref rid="R64" ref-type="bibr">Xia et al., 2020</xref>) is an adversarial approach to demoting unfairness, which we denote as ADV. It seeks to provide false positive rate (FPR) balance (<xref rid="R15" ref-type="bibr">Chouldechova, 2017</xref>) across groups, otherwise known as <italic>predictive equality</italic>. Being adversarial in nature, this method, and others (<xref rid="R13" ref-type="bibr">Chen et al., 2024</xref>) does not have any correspondence to any evaluation measure. Thus, users should take caution of possible metric divergence while using such techniques, with tuner <italic>&#x03B2;</italic> &#x2208; [0,1].</p>
<disp-formula><label>(8)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="block"><mml:mrow><mml:mi>A</mml:mi><mml:mi>D</mml:mi><mml:mi>V</mml:mi><mml:mo>=</mml:mo><mml:mi>&#x03B2;</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mi>B</mml:mi><mml:mi>C</mml:mi><mml:mi>E</mml:mi><mml:mo>+</mml:mo><mml:mfenced><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03B2;</mml:mi></mml:mrow></mml:mfenced><mml:mo>&#x22C5;</mml:mo><mml:mfenced><mml:mrow><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>v</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>s</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>y</mml:mi><mml:mfenced><mml:mrow><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>g</mml:mi></mml:mrow></mml:mfenced><mml:mo>&#x2212;</mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:mfenced></mml:mrow></mml:math></disp-formula>
<p>However, while ADV is motivated by FPR balance, no equivalence between the loss function and the evaluation metric is shown, exemplifying <italic>metric divergence</italic> between loss function and evaluation goal. Their reported results also show only limited empirical correspondence between reducing the model loss and reducing FPR.</p>
</sec>
<sec id="sec4_4">
<title>Experimental setup</title>
<p>We have two experimental setups with the weighted cross entropy (WCE) as <italic>f</italic>1 and the Fairness criteria as <italic>f</italic>2. First, we optimize the fairness measure directly as a SOO problem following Eq. 4 under a penalization setting, as proposed in CLA (<xref rid="R56" ref-type="bibr">Shen et al., 2022</xref>). Secondly, we use the MOO setting to find the best trade-offs between WCE and fairness measure following Eq. 6, with the SOO <italic>vs.</italic> MOO distinction described in Sec 4.</p>
</sec>
<sec id="sec4_5">
<title>Evaluation measures</title>
<p>Our focus in this work is the tension between minimizing <italic>accuracy difference</italic> (AD) (<xref rid="R16" ref-type="bibr">Das et al., 2021</xref>) and maximizing overall accuracy (OA). We thus evaluate on four post-hoc measures: OA over the dataset (majority and minority groups together), accuracy of each group separately, and AD observed between groups. Although we do not directly optimize F1, since a differentiable version of F1 does not exist, we still report the values in Appendix E.</p>
</sec>
</sec>
<sec id="sec5">
<title>Results</title>
<sec id="sec5_1">
<title>Existing bias in CNN, BiLSTM, BERT</title>
<p><xref ref-type="table" rid="T2">Table 2</xref> presents results for three toxic language classifiers optimized to maximize OA (<italic>i.e.,</italic> WCE) on Davidson <italic>et al</italic>. (<xref rid="R18" ref-type="bibr">Davidson et al., 2017</xref>)&#x2019;s dataset. The <italic>Majority</italic> class consistently shows 6-7% higher accuracy than the <italic>Minority</italic> class, across models and five random initialisations. Such imbalance serves as motivation for our work to optimize OA/AD across demographic groups. This inequality behavior in toxic language detection is consistent across all three neural models and both datasets. Due to space restrictions in the main body, we present the results only for the BERT- based classifier. However, our findings also apply to BiLSTM and CNN networks, whose results are available in Appendix F.</p>
<table-wrap id="T2">
<label>Table 2.</label>
<caption><p>Baseline accuracy results on Davidson <italic>et al</italic>. (<xref rid="R18" ref-type="bibr">Davidson et al., 2017</xref>)&#x2019;s dataset when maximizing overall accuracy (OA) only. Results show consistent bias of higher accuracy for the majority.</p></caption>
<table>
<thead>
<tr>
<th align="center" valign="top">Models</th>
<th align="center" valign="top">Overall %</th>
<th align="center" valign="top">Majority %</th>
<th align="center" valign="top">Minority %</th>
<th align="center" valign="top">AD%</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top">CNN</td>
<td align="center" valign="top">87.52 &#x00B1; 0.3</td>
<td align="center" valign="top">89.12i0.2</td>
<td align="center" valign="top">82.88 &#x00B1; 0.3</td>
<td align="center" valign="top">6.24 &#x00B1; 0.2</td>
</tr>
<tr>
<td align="center" valign="top">BiLSTM</td>
<td align="center" valign="top">87.60 &#x00B1; 0.2</td>
<td align="center" valign="top">89.37 &#x00B1; 0.2</td>
<td align="center" valign="top">82.46 &#x00B1; 0.1</td>
<td align="center" valign="top">6.91 &#x00B1; 0.3</td>
</tr>
<tr>
<td align="center" valign="top">BERT</td>
<td align="center" valign="top">88.84 &#x00B1; 0.2</td>
<td align="center" valign="top">90.35 &#x00B1; 0.2</td>
<td align="center" valign="top">84.47 &#x00B1; 0.1</td>
<td align="center" valign="top">5.88 &#x00B1; 0.1</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="sec5_2">
<title>Single objective optimization (SOO)</title>
<p><xref ref-type="table" rid="T3">Table 3</xref> shows the results for the SOO experimental setup. The baseline BERT model optimized via Cross Entropy obtains 88.84% OA and 5.88% AD on Davidson <italic>et al</italic>. (<xref rid="R18" ref-type="bibr">Davidson et al., 2017</xref>) and 84.68% OA and 3.88% AD on Wilds (<xref rid="R34" ref-type="bibr">Koh et al., 2021</xref>). All three loss functions successfully reduce the AD on both datasets. As expected, the improvement in fairness comes at the cost of lower OA. We evaluate the different optimization metrics by looking at both the change in AD and in OA.</p>
<p>ADV performs the worst of the three measures, most notably due to its relatively large drop in OA. Optimizing for GAP and CLA gives the same OA, where the two losses show no significant difference across 5 initializations. However, in terms of reducing AD, our GAP measure outperforms CLA by 0.9% on Davidson and 1.5% on Wilds. Looking at the results, we can conclude that GAP is the best performing measure in terms of reducing Accuracy Difference. The results are consistent across both datasets. These results show the value in optimizing a measure that correctly reflects the desired notion of fairness, as well as the benefit from directly optimizing the measure of interest, rather than surrogate or approximate loss functions, to avoid metric divergence.</p>
<table-wrap id="T3">
<label>Table 3.</label>
<caption><p>Optimizing fairness in a SOO setup. We compare a BERT-based model trained using cross entropy (Baseline) with models trained using different fairness measures. Our proposed measure (GAP) obtains the best results in reducing AD while maintaining high overall accuracy.</p></caption>
<table>
<thead>
<tr>
<th align="center" valign="top">Measure</th>
<th align="center" valign="top">Overall %</th>
<th align="center" valign="top">Majority %</th>
<th align="center" valign="top">Minority %</th>
<th align="center" valign="top">AD%</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top" colspan="5">Davidson</td>
</tr>
<tr>
<td align="center" valign="top">Baseline</td>
<td align="center" valign="top">88.84 &#x00B1; 0.2</td>
<td align="center" valign="top">90.35 &#x00B1; 0.2</td>
<td align="center" valign="top">84.47 &#x00B1; 0.1</td>
<td align="center" valign="top">5.88 &#x00B1; 0.1</td>
</tr>
<tr>
<td align="center" valign="top"><bold>GAP (Ours)</bold></td>
<td align="center" valign="top">87.32 &#x00B1; 0.1</td>
<td align="center" valign="top">87.35 &#x00B1; 0.1</td>
<td align="center" valign="top">87.26 &#x00B1; 0.1</td>
<td align="center" valign="top"><bold>0.09 &#x00B1; 0.0</bold></td>
</tr>
<tr>
<td align="center" valign="top">CLA</td>
<td align="center" valign="top">87.57 &#x00B1; 0.2</td>
<td align="center" valign="top">87.82 &#x00B1; 0.1</td>
<td align="center" valign="top">86.87 &#x00B1; 0.1</td>
<td align="center" valign="top">0.95 &#x00B1; 0.0</td>
</tr>
<tr>
<td align="center" valign="top">ADV</td>
<td align="center" valign="top">86.27 &#x00B1; 0.4</td>
<td align="center" valign="top">86.88 &#x00B1; 0.2</td>
<td align="center" valign="top">84.52 &#x00B1; 0.3</td>
<td align="center" valign="top">2.36 &#x00B1; 0.1</td>
</tr>
<tr>
<td align="center" valign="top" colspan="5">Wilds</td>
</tr>
<tr>
<td align="center" valign="top">Baseline</td>
<td align="center" valign="top">84.68 &#x00B1; 0.3</td>
<td align="center" valign="top">86.41 &#x00B1; 0.2</td>
<td align="center" valign="top">82.49 &#x00B1; 0.1</td>
<td align="center" valign="top">3.88 &#x00B1; 0.2</td>
</tr>
<tr>
<td align="center" valign="top"><bold>GAP (Ours)</bold></td>
<td align="center" valign="top">84.38 &#x00B1; 0.1</td>
<td align="center" valign="top">84.51 &#x00B1; 0.1</td>
<td align="center" valign="top">84.23 &#x00B1; 0.0</td>
<td align="center" valign="top"><bold>0.28 &#x00B1; 0.0</bold></td>
</tr>
<tr>
<td align="center" valign="top">CLA</td>
<td align="center" valign="top">84.43 &#x00B1; 0.1</td>
<td align="center" valign="top">85.23 &#x00B1; 0.1</td>
<td align="center" valign="top">83.41 &#x00B1; 0.0</td>
<td align="center" valign="top">1.82 &#x00B1; 0.1</td>
</tr>
<tr>
<td align="center" valign="top">ADV</td>
<td align="center" valign="top">83.61 &#x00B1; 0.2</td>
<td align="center" valign="top">84.17 &#x00B1; 0.1</td>
<td align="center" valign="top">82.91 &#x00B1; 0.1</td>
<td align="center" valign="top">1.26 &#x00B1; 0.1</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="F1">
<label>Figure 1.</label>
<caption><p>Trade-offs between accuracy difference (AD) and overall accuracy (OA), on the BERT based model with SUHNPF acting as hypernetwork for three methods &#x2014; GAP (ours), CLA, and ADV &#x2013; across the two datasets for <italic>&#x03B1;</italic> &#x2208; [0<italic>,</italic>1], with <italic>&#x03B1;</italic> = 0 optimizing AD only and <italic>&#x03B1;</italic> = 1 optimizing OA only. GAP achieves lower AD consistently across <italic>&#x03B1;</italic> settings and datasets, while a more modest drop in OA is observed across methods as AD is reduced.</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="images\c11-fig1.jpg"><alt-text>none</alt-text></graphic>
</fig>
</sec>
<sec id="sec5_3">
<title>Multi objective optimization (MOO)</title>
<p>In Section 6.2 we used GAP, CLA, or ADV to directly optimize fairness. However, the reduced AD comes at the cost of lower OA. In order to find the optimal trade-offs between fairness and accuracy, we use the SUHNPF framework in a MOO experimental setup. We use a BERT-based classifier and three different pairs of objective functions: WCE <italic>vs.</italic> GAP; WCE <italic>vs.</italic> CLA; and WCE <italic>vs.</italic> ADV, learning a linear MOO trade-off between the two competing objectives.</p>
<p><xref ref-type="fig" rid="F1">Fig. 1</xref> shows the results of the MOO experiments. SUHNPF allows us to control how important is each objective (accuracy <italic>vs.</italic> fairness) by choosing the value of <italic>&#x03B1;</italic>. At <italic>&#x03B1;</italic><bold>=1</bold>, we optimize only for Accuracy, and at <italic>&#x03B1;</italic><bold>=0</bold>, only for fairness. We illustrate the different trade-offs at 4 points of the Pareto front (<italic>&#x03B1;</italic> = 0, 0.25, 0.5, and 0.75). We can observe that with decreasing <italic>&#x03B1;</italic>, both AD and OA decrease. For ADV we can see that the drop in AD is comparable to the drop in OA, which is not an efficient trade-off between accuracy <italic>vs.</italic> fairness. GAP and CLA maintain a relatively consistent OA, while GAP reduces AD far more than CLA, yielding the best trade-off for each <italic>&#x03B1;</italic>. See Appendix E for discussion on metric divergence and tabulated values in experiments. We can conclude that GAP is consistently the best metric, across SOO and MOO experimental setups and across different values of <italic>&#x03B1;</italic> for MOO.</p>
</sec>
</sec>
<sec id="sec6">
<title>Conclusion</title>
<p><italic>Optimizing fairness:</italic> since fairness measures embody different underlying assumptions and statistical choices, selecting an appropriate fairness metric often depends on the task, use case, and stakeholder priorities. In this work, we focus on a popular fairness objective of balancing accuracy across different demographic groups, also known as minimizing Accuracy Difference. We show that our <italic>Group Accuracy Parity</italic> (GAP) measure directly optimizes AD without <italic>metric divergence</italic> between loss function <italic>vs.</italic> evaluation metric. Results show GAP consistently achieves lower AD than prior work with modest loss in OA across datasets.</p>
<p><italic>MOO and toxic language detection:</italic> rather than force the users to settle for any single accuracy or fairness measure, we further adopt SUHNPF, a multi-objective optimization (MOO) framing for joint pursuit of multiple objectives. We learn the full Pareto manifold over competing objectives so that users can view the full space of feasible trade-offs and choose any desired trade-off on the solution manifold, <italic>a posteriori</italic>. We empirically demonstrate that our measure GAP performs better than alternative differentiable fairness objectives in reducing AD. To the best of our knowledge this is the first use of MOO for fair toxic language detection.</p>
<p><italic>Fairness and toxic language detection:</italic> we explore two different aspects of fairness in toxic language detection: 1) fair moderation of posts written by authors from different demographic groups; and 2) fair protection of different groups targeted by posts. We successfully improved the fairness of the models in both experimental setups, demonstrating the generality of the proposed approach.</p>
<p><italic>Extending GAP to multiple classes and demographic groups:</italic> we formulate GAP following the strict definition of AD, which is for two classes and two demographic groups. Fairness literature has discussed heuristics and formulations for extending AD to multi-group and multi-class classification and balancing between multiple groups. As a future work, GAP can be extended based on those hypotheses.</p>
<p><italic>Group identification:</italic> with author demographics in Davidson <italic>et al</italic>. (<xref rid="R18" ref-type="bibr">Davidson et al., 2017</xref>)&#x2019;s dataset, we rely on automatic detection of author dialect, which is noisy. With target group demographics in Wilds (<xref rid="R34" ref-type="bibr">Koh et al., 2021</xref>), we assume oracle knowledge of target groups from annotation, which would have to be noisily detected in practice. In both cases, therefore, we make simplifying assumptions in this work. Optimizing trade-offs with awareness of noise in detection of demographic groups thus remains another direction for future work.</p>
<p><italic>Dataset debiasing:</italic> recent studies highlight the risks of annotation bias, be it by annotator guidelines or the annotators themselves. Sap <italic>et al</italic>. (<xref rid="R54" ref-type="bibr">Sap et al., 2019</xref>) and Davidson <italic>et al</italic>. (<xref rid="R17" ref-type="bibr">Davidson et al., 2019</xref>) analyse the correlation between race and gold-label of toxicity in several datasets and find a strong association between African American English (AAE) markers and toxicity annotation. Because our work is restricted to balancing accuracy across the sensitive attribute, given the dataset as it is annotated, our results our limited by any such bias present in the data (<xref rid="R40" ref-type="bibr">Ludwig et al., 2024</xref>), Addressing such annotation bias thus remains another key direction for future work.</p>
<p><italic>Generality and scope of this work:</italic> we implement GAP and SUHNPF for the task of TL detection and demonstrate promising results - improved fairness and computational efficiency. However, our work can be extended to other tasks, datasets, and neural models in any practical situation where ensuring equal accuracy across different demographic groups is a desired objective. Recently, Kovatchev and Lease (Kovatchev &#x0026; Lease, 2024) demonstrated the significant impact of imbalanced data in popular NLP benchmarks. Our work can help address that challenge.</p>
</sec>
</body>
<back>
<ack>
<title>Acknowledgments</title>
<p>We thank the anonymous reviewers for their valuable feedback. This research was supported in part by Amazon, Wipro, the Knight Foundation, the Micron Foundation, and by Good Systems (https://goodsystems.utexas.edu), a UT Austin Grand Challenge to develop responsible AI technologies. The statements herein reflect the authors&#x2019; opinions only.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="R1"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Angwin</surname><given-names>J.</given-names></name><name><surname>Larson</surname><given-names>J.</given-names></name><name><surname>Mattu</surname><given-names>S.</given-names></name><name><surname>Kirchner</surname><given-names>L.</given-names></name></person-group> <year>(2016)</year> <article-title>Machine bias</article-title><source>Ethics of data and analytics</source><fpage>254</fpage><lpage>264</lpage><publisher-name>Auerbach Publications</publisher-name></element-citation></ref>
<ref id="R2"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Arango</surname><given-names>A.</given-names></name><name><surname>P&#x00E9;rez</surname><given-names>J.</given-names></name><name><surname>Poblete</surname><given-names>B.</given-names></name></person-group> <year>(2019)</year> <article-title>Hate speech detection is not as easy as you may think: A closer look at model validation</article-title><source>Proceedings of the 42nd international acm sigir conference on research and development in information retrieval</source><fpage>45</fpage><lpage>54</lpage></element-citation></ref>
<ref id="R3"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Balashankar</surname><given-names>A.</given-names></name><name><surname>Lees</surname><given-names>A.</given-names></name></person-group> <year>(2022)</year> <article-title>The need for transparent demographic group trade-offs in credit risk and income classification</article-title><source>International Conference on Information</source><fpage>344</fpage><lpage>354</lpage></element-citation></ref>
<ref id="R4"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Balashankar</surname><given-names>A.</given-names></name><name><surname>Lees</surname><given-names>A.</given-names></name><name><surname>Welty</surname><given-names>C.</given-names></name><name><surname>Subramanian</surname><given-names>L.</given-names></name></person-group> <year>(2019)</year> <source>What is fair? exploring pareto-efficiency for fairness constrained classifiers</source><comment>arXiv preprint arXiv:1910.14120</comment></element-citation></ref>
<ref id="R5"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Ball-Burack</surname><given-names>A.</given-names></name><name><surname>Lee</surname><given-names>M. S. A.</given-names></name><name><surname>Cobbe</surname><given-names>J.</given-names></name><name><surname>Singh</surname><given-names>J.</given-names></name></person-group> <year>(2021)</year> <article-title>Differential tweetment: Mitigating racial dialect bias in harmful tweet detection</article-title><source>Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency</source><fpage>116</fpage><lpage>128</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1145/3442188.3445875">https://doi.org/10.1145/3442188.3445875</ext-link></element-citation></ref>
<ref id="R6"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Bellamy</surname><given-names>R. K.</given-names></name><name><surname>Dey</surname><given-names>K.</given-names></name><name><surname>Hind</surname><given-names>M.</given-names></name><name><surname>Hoffman</surname><given-names>S. C.</given-names></name><name><surname>Houde</surname><given-names>S.</given-names></name><name><surname>Kannan</surname><given-names>K.</given-names></name><name><surname>Lohia</surname><given-names>P.</given-names></name><name><surname>Martino</surname><given-names>J.</given-names></name><name><surname>Mehta</surname><given-names>S.</given-names></name><name><surname>Mojsilovic</surname><given-names>A.</given-names></name></person-group><etal/> <year>(2018)</year> <article-title>Ai fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias</article-title><comment>arxiv preprint. arXiv preprint arXiv:1810.01943</comment></element-citation></ref>
<ref id="R7"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Berk</surname><given-names>R.</given-names></name><name><surname>Heidari</surname><given-names>H.</given-names></name><name><surname>Jabbari</surname><given-names>S.</given-names></name><name><surname>Kearns</surname><given-names>M.</given-names></name><name><surname>Roth</surname><given-names>A.</given-names></name></person-group> <year>(2021)</year> <article-title>Fairness in criminal justice risk assessments: The state of the art</article-title><source>Sociological Methods &#x0026; Research</source><volume>50</volume><issue>1</issue><fpage>3</fpage><lpage>44</lpage></element-citation></ref>
<ref id="R8"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Blodgett</surname><given-names>S. L.</given-names></name><name><surname>Wei</surname><given-names>J.</given-names></name><name><surname>O&#x2019;Connor</surname><given-names>B.</given-names></name></person-group> <year>(2017)</year> <article-title>A dataset and classifier for recognizing social media English</article-title><source>Proceedings of the 3rd Workshop on Noisy User-generated Text</source><fpage>56</fpage><lpage>61</lpage></element-citation></ref>
<ref id="R9"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Borkan</surname><given-names>D.</given-names></name><name><surname>Dixon</surname><given-names>L.</given-names></name><name><surname>Sorensen</surname><given-names>J.</given-names></name><name><surname>Thain</surname><given-names>N.</given-names></name><name><surname>Vasserman</surname><given-names>L.</given-names></name></person-group> <year>(2019)</year> <article-title>Nuanced metrics for measuring unintended bias with real data for text classification</article-title><source>Companion proceedings of the 2019 world wide web conference</source><fpage>491</fpage><lpage>500</lpage></element-citation></ref>
<ref id="R10"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Boyd</surname><given-names>S.</given-names></name><name><surname>Boyd</surname><given-names>S. P.</given-names></name><name><surname>Vandenberghe</surname><given-names>L.</given-names></name></person-group> <year>(2004)</year> <source>Convex optimization</source><publisher-name>Cambridge university press</publisher-name></element-citation></ref>
<ref id="R11"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Caton</surname><given-names>S.</given-names></name><name><surname>Haas</surname><given-names>C.</given-names></name></person-group> <year>(2020)</year> <article-title>Fairness in machine learning: A survey</article-title><comment>arXiv preprint arXiv:2010.04053</comment></element-citation></ref>
<ref id="R12"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Chaudhry</surname><given-names>P.</given-names></name><name><surname>Lease</surname><given-names>M.</given-names></name></person-group> <year>(2022)</year> <article-title>You are what you tweet: Profiling users by past tweets to improve hate speech detection</article-title><source>International Conference on Information</source><fpage>195</fpage><lpage>203</lpage></element-citation></ref>
<ref id="R13"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Chen</surname><given-names>T.</given-names></name><name><surname>Wang</surname><given-names>D.</given-names></name><name><surname>Liang</surname><given-names>X.</given-names></name><name><surname>Risius</surname><given-names>M.</given-names></name><name><surname>Demartini</surname><given-names>G.</given-names></name><name><surname>Yin</surname><given-names>H.</given-names></name></person-group> <year>(2024)</year> <article-title>Hate speech detection with generalizable target-aware fairness</article-title><source>Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining</source><fpage>365</fpage><lpage>375</lpage></element-citation></ref>
<ref id="R14"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Chollet</surname><given-names>F.</given-names></name></person-group> <year>(2015)</year> <article-title>Keras</article-title></element-citation></ref>
<ref id="R15"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chouldechova</surname><given-names>A.</given-names></name></person-group> <year>(2017)</year> <article-title>Fair prediction with disparate impact: A study of bias in recidivism prediction instruments</article-title><source>Big Data</source><volume>5</volume><issue>2</issue><fpage>153</fpage><lpage>163</lpage></element-citation></ref>
<ref id="R16"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Das</surname><given-names>S.</given-names></name><name><surname>Donini</surname><given-names>M.</given-names></name><name><surname>Gelman</surname><given-names>J.</given-names></name><name><surname>Haas</surname><given-names>K.</given-names></name><name><surname>Hardt</surname><given-names>M.</given-names></name><name><surname>Katzman</surname><given-names>J.</given-names></name><name><surname>Kenthapadi</surname><given-names>K.</given-names></name><name><surname>Larroy</surname><given-names>P.</given-names></name><name><surname>Yilmaz</surname><given-names>P.</given-names></name><name><surname>Zafar</surname><given-names>M. B.</given-names></name></person-group> <year>(2021)</year> <article-title>Fairness measures for machine learning in finance</article-title><source>The Journal of Financial Data Science</source><volume>3</volume><issue>4</issue><fpage>33</fpage><lpage>64</lpage></element-citation></ref>
<ref id="R17"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Davidson</surname><given-names>T.</given-names></name><name><surname>Bhattacharya</surname><given-names>D.</given-names></name><name><surname>Weber</surname><given-names>I.</given-names></name></person-group> <year>(2019)</year> <article-title>Racial bias in hate speech and abusive language detection datasets</article-title><comment>arXiv preprint arXiv:1905.12516</comment></element-citation></ref>
<ref id="R18"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Davidson</surname><given-names>T.</given-names></name><name><surname>Warmsley</surname><given-names>D.</given-names></name><name><surname>Macy</surname><given-names>M.</given-names></name><name><surname>Weber</surname><given-names>I.</given-names></name></person-group> <year>(2017)</year> <article-title>Automated hate speech detection and the problem of offensive language</article-title><source>Proceedings of the International AAAI Conference on Web and Social Media</source></element-citation></ref>
<ref id="R19"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Devlin</surname><given-names>J.</given-names></name><name><surname>Chang</surname><given-names>M.-W.</given-names></name><name><surname>Lee</surname><given-names>K.</given-names></name><name><surname>Toutanova</surname><given-names>K.</given-names></name></person-group> <year>(2018)</year> <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title><comment>arXiv preprint arXiv:1810.04805</comment></element-citation></ref>
<ref id="R20"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Dieterich</surname><given-names>W.</given-names></name><name><surname>Mendoza</surname><given-names>C.</given-names></name><name><surname>Brennan</surname><given-names>T.</given-names></name></person-group> <year>(2016)</year> <article-title>Compas risk scales: Demonstrating accuracy equity and predictive parity</article-title><source>Northpointe Inc</source><volume>7</volume><issue>4</issue></element-citation></ref>
<ref id="R21"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ekstrand</surname><given-names>M. D.</given-names></name><name><surname>Dass</surname><given-names>A.</given-names></name><name><surname>Burke</surname><given-names>R.</given-names></name><name><surname>Diaz</surname><given-names>F.</given-names></name></person-group><etal/> <year>(2022)</year> <article-title>Fairness in information access systems</article-title><source>Foundations and Trends&#x00AE; in Information Retrieval</source><volume>16</volume><issue>1&#x2013;2</issue><fpage>1</fpage><lpage>177</lpage></element-citation></ref>
<ref id="R22"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Feldman</surname><given-names>M.</given-names></name><name><surname>Friedler</surname><given-names>S. A.</given-names></name><name><surname>Moeller</surname><given-names>J.</given-names></name><name><surname>Scheidegger</surname><given-names>C.</given-names></name><name><surname>Venkatasubramanian</surname><given-names>S.</given-names></name></person-group> <year>(2015)</year> <article-title>Certifying and removing disparate impact</article-title><source>proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining</source><fpage>259</fpage><lpage>268</lpage></element-citation></ref>
<ref id="R23"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Fortuna</surname><given-names>P.</given-names></name><name><surname>Nunes</surname><given-names>S.</given-names></name></person-group> <year>(2018)</year> <article-title>A survey on automatic detection of hate speech in text</article-title><source>ACM Computing Surveys (CSUR)</source><volume>51</volume><issue>4</issue><fpage>1</fpage><lpage>30</lpage></element-citation></ref>
<ref id="R24"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Founta</surname><given-names>A.-M.</given-names></name><name><surname>Djouvas</surname><given-names>C.</given-names></name><name><surname>Chatzakou</surname><given-names>D.</given-names></name><name><surname>Leontiadis</surname><given-names>I.</given-names></name><name><surname>Blackburn</surname><given-names>J.</given-names></name><name><surname>Stringhini</surname><given-names>G.</given-names></name><name><surname>Vakali</surname><given-names>A.</given-names></name><name><surname>Sirivianos</surname><given-names>M.</given-names></name><name><surname>Kourtellis</surname><given-names>N.</given-names></name></person-group> <year>(2018)</year> <article-title>Large scale crowdsourcing and characterization of twitter abusive behavior</article-title><ext-link ext-link-type="uri" xlink:href="https://open.bu.edu/handle/2144/40119">https://open.bu.edu/handle/2144/40119</ext-link></element-citation></ref>
<ref id="R25"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Friedler</surname><given-names>S. A.</given-names></name><name><surname>Scheidegger</surname><given-names>C.</given-names></name><name><surname>Venkatasubramanian</surname><given-names>S.</given-names></name></person-group> <year>(2021)</year> <article-title>The (im) possibility of fairness: Different value systems require different mechanisms for fair decision making</article-title><source>Communications of the ACM</source><volume>64</volume><issue>4</issue><fpage>136</fpage><lpage>143</lpage></element-citation></ref>
<ref id="R26"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Friedler</surname><given-names>S. A.</given-names></name><name><surname>Scheidegger</surname><given-names>C.</given-names></name><name><surname>Venkatasubramanian</surname><given-names>S.</given-names></name><name><surname>Choudhary</surname><given-names>S.</given-names></name><name><surname>Hamilton</surname><given-names>E. P.</given-names></name><name><surname>Roth</surname><given-names>D.</given-names></name></person-group> <year>(2019)</year> <article-title>A comparative study of fairness-enhancing interventions in machine learning</article-title><source>Proceedings of the conference on fairness, accountability, and transparency</source><fpage>329</fpage><lpage>338</lpage></element-citation></ref>
<ref id="R27"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Gamb&#x00E4;ck</surname><given-names>B.</given-names></name><name><surname>Sikdar</surname><given-names>U. K.</given-names></name></person-group> <year>(2017)</year> <article-title>Using convolutional neural networks to classify hate-speech</article-title><source>Proceedings of the first workshop on abusive language online</source><fpage>85</fpage><lpage>90</lpage></element-citation></ref>
<ref id="R28"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Graves</surname><given-names>A.</given-names></name><name><surname>Fern&#x00E1;ndez</surname><given-names>S.</given-names></name><name><surname>Schmidhuber</surname><given-names>J.</given-names></name></person-group> <year>(2005)</year> <article-title>Bidirectional lstm networks for improved phoneme classification and recognition</article-title><source>International conference on artificial neural networks</source><fpage>799</fpage><lpage>804</lpage></element-citation></ref>
<ref id="R29"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Gupta</surname><given-names>S.</given-names></name><name><surname>Singh</surname><given-names>G.</given-names></name><name><surname>Bollapragada</surname><given-names>R.</given-names></name><name><surname>Lease</surname><given-names>M.</given-names></name></person-group> <year>(2022)</year> <article-title>Learning a Neural Pareto Manifold Extractor with Constraints</article-title><source>Proceedings of the 38th International Conference on Uncertainty in Artificial Intelligence (UAI)</source></element-citation></ref>
<ref id="R30"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Ha</surname><given-names>D.</given-names></name><name><surname>Dai</surname><given-names>A. M.</given-names></name><name><surname>Le</surname><given-names>Q. V.</given-names></name></person-group> <year>(2017)</year> <article-title>Hypernetworks</article-title><source>5th International Conference on Learning Representations</source><comment>ICLR 2017</comment><publisher-loc>Toulon, France</publisher-loc><comment>April 24-26, 2017, Conference Track Proceedings</comment><ext-link ext-link-type="uri" xlink:href="https://openreview.net/forum? id=rkpACe1lx">https://openreview.net/forum? id=rkpACe1lx</ext-link></element-citation></ref>
<ref id="R31"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hardt</surname><given-names>M.</given-names></name><name><surname>Price</surname><given-names>E.</given-names></name><name><surname>Srebro</surname><given-names>N.</given-names></name></person-group> <year>(2016)</year> <article-title>Equality of opportunity in supervised learning</article-title><source>Advances in neural information processing systems</source><volume>29</volume></element-citation></ref>
<ref id="R32"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Heidari</surname><given-names>H.</given-names></name><name><surname>Loi</surname><given-names>M.</given-names></name><name><surname>Gummadi</surname><given-names>K. P.</given-names></name><name><surname>Krause</surname><given-names>A.</given-names></name></person-group> <year>(2019)</year> <article-title>A moral framework for understanding fair ml through economic models of equality of opportunity</article-title><source>Proceedings of the conference on fairness, accountability, and transparency</source><fpage>181</fpage><lpage>190</lpage></element-citation></ref>
<ref id="R33"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Kingma</surname><given-names>D. P.</given-names></name><name><surname>Ba</surname><given-names>J.</given-names></name></person-group> <year>(2014)</year> <article-title>Adam: A method for stochastic optimization</article-title><comment>arXiv preprint arXiv:1412.6980</comment></element-citation></ref>
<ref id="R34"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Koh</surname><given-names>P. W.</given-names></name><name><surname>Sagawa</surname><given-names>S.</given-names></name><name><surname>Marklund</surname><given-names>H.</given-names></name><name><surname>Xie</surname><given-names>S. M.</given-names></name><name><surname>Zhang</surname><given-names>M.</given-names></name><name><surname>Balsubramani</surname><given-names>A.</given-names></name><name><surname>Hu</surname><given-names>W.</given-names></name><name><surname>Yasunaga</surname><given-names>M.</given-names></name><name><surname>Phillips</surname><given-names>R. L.</given-names></name><name><surname>Gao</surname><given-names>I.</given-names></name></person-group><etal/> <year>(2021)</year> <article-title>Wilds: A benchmark of in-the-wild distribution shifts</article-title><source>International Conference on Machine Learning</source><fpage>5637</fpage><lpage>5664</lpage></element-citation></ref>
<ref id="R35"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Kovatchev</surname><given-names>V.</given-names></name><name><surname>Lease</surname><given-names>M.</given-names></name></person-group> <year>(2024)</year> <comment>June</comment><article-title>Benchmark transparency: Measuring the impact of data on evaluation</article-title><person-group person-group-type="editor"><name><surname>Duh</surname><given-names>K.</given-names></name></person-group><person-group person-group-type="editor"><name><surname>Gomez</surname><given-names>H.</given-names></name></person-group><person-group person-group-type="editor"><name><surname>Bethard</surname><given-names>S.</given-names></name></person-group><source>Proceedings of the 2024 conference of the north american chapter of the association for computational linguistics: Human language technologies (volume 1: Long papers)</source><fpage>1536</fpage><lpage>1551</lpage><comment>Association for Computational Linguistics</comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.18653/v1/2024.naacl-long.86">https://doi.org/10.18653/v1/2024.naacl-long.86</ext-link></element-citation></ref>
<ref id="R36"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Lin</surname><given-names>T.-Y.</given-names></name><name><surname>Goyal</surname><given-names>P.</given-names></name><name><surname>Girshick</surname><given-names>R.</given-names></name><name><surname>He</surname><given-names>K.</given-names></name><name><surname>Doll&#x00E1;r</surname><given-names>P.</given-names></name></person-group> <year>(2017)</year> <article-title>Focal loss for dense object detection</article-title><source>Proceedings of the IEEE international conference on computer vision</source><fpage>2980</fpage><lpage>2988</lpage></element-citation></ref>
<ref id="R37"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Lin</surname><given-names>X.</given-names></name><name><surname>Yang</surname><given-names>Z.</given-names></name><name><surname>Zhang</surname><given-names>Q.</given-names></name><name><surname>Kwong</surname><given-names>S.</given-names></name></person-group> <year>(2020)</year> <article-title>Controllable pareto multi-task learning</article-title><comment>arXiv preprint arXiv:2010.06313</comment></element-citation></ref>
<ref id="R38"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Lin</surname><given-names>X.</given-names></name><name><surname>Chen</surname><given-names>H.</given-names></name><name><surname>Pei</surname><given-names>C.</given-names></name><name><surname>Sun</surname><given-names>F.</given-names></name><name><surname>Xiao</surname><given-names>X.</given-names></name><name><surname>Sun</surname><given-names>H.</given-names></name><name><surname>Zhang</surname><given-names>Y.</given-names></name><name><surname>Ou</surname><given-names>W.</given-names></name><name><surname>Jiang</surname><given-names>P.</given-names></name></person-group> <year>(2019)</year> <article-title>A pareto-efficient algorithm for multiple objective optimization in e-commerce recommendation</article-title><source>Proceedings of the 13th ACM Conference on Recommender Systems</source><fpage>20</fpage><lpage>28</lpage></element-citation></ref>
<ref id="R39"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Little</surname><given-names>C.</given-names></name></person-group> <year>(2023)</year> <article-title>To the fairness frontier and beyond: Identifying, quantifying, and optimizing the fairness-accuracy pareto frontier</article-title><comment>Master&#x2019;s thesis</comment><publisher-name>Rice University</publisher-name></element-citation></ref>
<ref id="R40"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Ludwig</surname><given-names>F.</given-names></name><name><surname>Dolos</surname><given-names>K.</given-names></name><name><surname>Alves-Pinto</surname><given-names>A.</given-names></name><name><surname>Zesch</surname><given-names>T.</given-names></name></person-group> <year>(2024)</year> <article-title>Unraveling the dynamics of semi&#x2013;supervised hate speech detection: The impact of unlabeled data characteristics and pseudo-labeling strategies</article-title><source>Findings of the Association for Computational Linguistics: EACL 2024</source><fpage>1974</fpage><lpage>1986</lpage></element-citation></ref>
<ref id="R41"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>MacAvaney</surname><given-names>S.</given-names></name><name><surname>Yao</surname><given-names>H.-R.</given-names></name><name><surname>Yang</surname><given-names>E.</given-names></name><name><surname>Russell</surname><given-names>K.</given-names></name><name><surname>Goharian</surname><given-names>N.</given-names></name><name><surname>Frieder</surname><given-names>O.</given-names></name></person-group> <year>(2019)</year> <article-title>Hate speech detection: Challenges and solutions</article-title><source>PloS one</source><volume>14</volume><issue>8</issue></element-citation></ref>
<ref id="R42"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Martinez</surname><given-names>N.</given-names></name><name><surname>Bertran</surname><given-names>M.</given-names></name><name><surname>Sapiro</surname><given-names>G.</given-names></name></person-group> <year>(2020)</year> <article-title>Minimax pareto fairness: A multi objective perspective</article-title><source>Proceedings of the 37th International Conference on Machine Learning</source></element-citation></ref>
<ref id="R43"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Metzler</surname><given-names>D.</given-names></name><name><surname>Croft</surname><given-names>W. B.</given-names></name></person-group> <year>(2007)</year> <article-title>Linear feature-based models for information retrieval</article-title><source>Information Retrieval</source><volume>10</volume><issue>3</issue><fpage>257</fpage><lpage>274</lpage></element-citation></ref>
<ref id="R44"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Mitchell</surname><given-names>S.</given-names></name><name><surname>Potash</surname><given-names>E.</given-names></name><name><surname>Barocas</surname><given-names>S.</given-names></name><name><surname>D&#x2019;Amour</surname><given-names>A.</given-names></name><name><surname>Lum</surname><given-names>K.</given-names></name></person-group> <year>(2021)</year> <article-title>Algorithmic fairness: Choices, assumptions, and definitions</article-title><source>Annual Review of Statistics and Its Application</source><volume>8</volume><fpage>141</fpage><lpage>163</lpage></element-citation></ref>
<ref id="R45"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Morgan</surname><given-names>W.</given-names></name><name><surname>Greiff</surname><given-names>W.</given-names></name><name><surname>Henderson</surname><given-names>J.</given-names></name></person-group> <year>(2004)</year> <article-title>Direct maximization of average precision by hill&#x2013;climbing, with a comparison to a maximum entropy approach</article-title><source>Proceedings of HLT-NAACL 2004: Short Papers</source><fpage>93</fpage><lpage>96</lpage></element-citation></ref>
<ref id="R46"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Narayanan</surname><given-names>A.</given-names></name></person-group> <year>(2018)</year> <article-title>21 fairness definitions and their politics: A tutorial [<ext-link ext-link-type="uri" xlink:href="https://shubhamjain0594.github.io/post/tlds-arvind-fairness-definitions/">https://shubhamjain0594.github.io/post/tlds-arvind-fairness-definitions/</ext-link>]</article-title><source>Proceedings of the ACM FAccT Conference on Fairness, Accountability and Transparency</source></element-citation></ref>
<ref id="R47"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Navon</surname><given-names>A.</given-names></name><name><surname>Shamsian</surname><given-names>A.</given-names></name><name><surname>Fetaya</surname><given-names>E.</given-names></name><name><surname>Chechik</surname><given-names>G.</given-names></name></person-group> <year>(2021)</year> <article-title>Learning the pareto front with hypernetworks</article-title><source>International Conference on Learning Representations</source></element-citation></ref>
<ref id="R48"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Park</surname><given-names>J. H.</given-names></name><name><surname>Shin</surname><given-names>J.</given-names></name><name><surname>Fung</surname><given-names>P.</given-names></name></person-group> <year>(2018)</year> <article-title>Reducing gender bias in abusive language detection</article-title><source>Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing</source><fpage>2799</fpage><lpage>2804</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.18653/v1/D18-1302">https://doi.org/10.18653/v1/D18-1302</ext-link></element-citation></ref>
<ref id="R49"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pedregosa</surname><given-names>F.</given-names></name><name><surname>Varoquaux</surname><given-names>G.</given-names></name><name><surname>Gramfort</surname><given-names>A.</given-names></name><name><surname>Michel</surname><given-names>V.</given-names></name><name><surname>Thirion</surname><given-names>B.</given-names></name><name><surname>Grisel</surname><given-names>O.</given-names></name><name><surname>Blondel</surname><given-names>M.</given-names></name><name><surname>Prettenhofer</surname><given-names>P.</given-names></name><name><surname>Weiss</surname><given-names>R.</given-names></name><name><surname>Dubourg</surname><given-names>V.</given-names></name><name><surname>Vanderplas</surname><given-names>J.</given-names></name><name><surname>Passos</surname><given-names>A.</given-names></name><name><surname>Cournapeau</surname><given-names>D.</given-names></name><name><surname>Brucher</surname><given-names>M.</given-names></name><name><surname>Perrot</surname><given-names>M.</given-names></name><name><surname>Duchesnay</surname><given-names>E.</given-names></name></person-group> <year>(2011)</year> <article-title>Scikit-learn: Machine learning in Python</article-title><source>Journal of Machine Learning Research</source><volume>12</volume><fpage>2825</fpage><lpage>2830</lpage></element-citation></ref>
<ref id="R50"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Poletto</surname><given-names>F.</given-names></name><name><surname>Basile</surname><given-names>V.</given-names></name><name><surname>Sanguinetti</surname><given-names>M.</given-names></name><name><surname>Bosco</surname><given-names>C.</given-names></name><name><surname>Patti</surname><given-names>V.</given-names></name></person-group> <year>(2021)</year> <article-title>Resources and benchmark corpora for hate speech detection: A systematic review</article-title><source>Language Resources and Evaluation</source><volume>55</volume><issue>2</issue><fpage>477</fpage><lpage>523</lpage></element-citation></ref>
<ref id="R51"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Rahman</surname><given-names>M. M.</given-names></name><name><surname>Kutlu</surname><given-names>M.</given-names></name><name><surname>Lease</surname><given-names>M.</given-names></name></person-group> <year>(2022)</year> <article-title>Understanding and predicting characteristics of test collections in information retrieval</article-title><source>International Conference on Information</source><fpage>136</fpage><lpage>148</lpage></element-citation></ref>
<ref id="R52"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>R&#x00F6;ttger</surname><given-names>P.</given-names></name><name><surname>Vidgen</surname><given-names>B.</given-names></name><name><surname>Nguyen</surname><given-names>D.</given-names></name><name><surname>Waseem</surname><given-names>Z.</given-names></name><name><surname>Margetts</surname><given-names>H.</given-names></name><name><surname>Pierrehumbert</surname><given-names>J.</given-names></name></person-group> <year>(2021)</year> <article-title>HateCheck: Functional tests for hate speech detection models</article-title><source>Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)</source><fpage>41</fpage><lpage>58</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.18653/v1/2021.acl-long.4">https://doi.org/10.18653/v1/2021.acl-long.4</ext-link></element-citation></ref>
<ref id="R53"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Sang</surname><given-names>Y.</given-names></name><name><surname>Stanton</surname><given-names>J.</given-names></name></person-group> <year>(2022)</year> <article-title>The origin and value of disagreement among data labellers: A case study of individual differences in hate speech annotation</article-title><source>International Conference on Information</source><fpage>425</fpage><lpage>444</lpage></element-citation></ref>
<ref id="R54"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Sap</surname><given-names>M.</given-names></name><name><surname>Card</surname><given-names>D.</given-names></name><name><surname>Gabriel</surname><given-names>S.</given-names></name><name><surname>Choi</surname><given-names>Y.</given-names></name><name><surname>Smith</surname><given-names>N. A.</given-names></name></person-group> <year>(2019)</year> <article-title>The risk of racial bias in hate speech detection</article-title><source>Proceedings of the 57th annual meeting of the association for computational linguistics</source><fpage>1668</fpage><lpage>1678</lpage></element-citation></ref>
<ref id="R55"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Schmidt</surname><given-names>A.</given-names></name><name><surname>Wiegand</surname><given-names>M.</given-names></name></person-group> <year>(2017)</year> <article-title>A survey on hate speech detection using natural language processing</article-title><source>Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media</source><fpage>1</fpage><lpage>10</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/ 10.18653/v1/W17-1101">https://doi.org/ 10.18653/v1/W17-1101</ext-link></element-citation></ref>
<ref id="R56"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Shen</surname><given-names>A.</given-names></name><name><surname>Han</surname><given-names>X.</given-names></name><name><surname>Cohn</surname><given-names>T.</given-names></name><name><surname>Baldwin</surname><given-names>T.</given-names></name><name><surname>Frermann</surname><given-names>L.</given-names></name></person-group> <year>(2022)</year> <article-title>Optimising equal opportunity fairness in model training</article-title><source>Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source><fpage>4073</fpage><lpage>4084</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.18653/v1/2022.naacl-main.299">https://doi.org/10.18653/v1/2022.naacl-main.299</ext-link></element-citation></ref>
<ref id="R57"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Sorensen</surname><given-names>T.</given-names></name><name><surname>Moore</surname><given-names>J.</given-names></name><name><surname>Fisher</surname><given-names>J.</given-names></name><name><surname>Gordon</surname><given-names>M.</given-names></name><name><surname>Mireshghallah</surname><given-names>N.</given-names></name><name><surname>Rytting</surname><given-names>C. M.</given-names></name><name><surname>Ye</surname><given-names>A.</given-names></name><name><surname>Jiang</surname><given-names>L.</given-names></name><name><surname>Lu</surname><given-names>X.</given-names></name><name><surname>Dziri</surname><given-names>N.</given-names></name></person-group><etal/> <year>(2024)</year> <article-title>A roadmap to pluralistic alignment</article-title><comment>arXiv preprint arXiv:2402.05070</comment></element-citation></ref>
<ref id="R58"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Soto</surname><given-names>C. P.</given-names></name><name><surname>Nunes</surname><given-names>G. M.</given-names></name><name><surname>Gomes</surname><given-names>J. G. R.</given-names></name><name><surname>Nedjah</surname><given-names>N.</given-names></name></person-group> <year>(2022)</year> <article-title>Application-specific word embeddings for hate and offensive language detection</article-title><source>Multimedia Tools and Applications</source><volume>81</volume><issue>19</issue><fpage>27111</fpage><lpage>27136</lpage></element-citation></ref>
<ref id="R59"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Suau</surname><given-names>X.</given-names></name><name><surname>Delobelle</surname><given-names>P.</given-names></name><name><surname>Metcalf</surname><given-names>K.</given-names></name><name><surname>Joulin</surname><given-names>A.</given-names></name><name><surname>Apostoloff</surname><given-names>N.</given-names></name><name><surname>Zappella</surname><given-names>L.</given-names></name><name><surname>Rodr&#x00ED;guez</surname><given-names>P.</given-names></name></person-group> <year>(2024)</year> <article-title>Whispering experts: Neural interventions for toxicity mitigation in language models</article-title><comment>arXiv preprint arXiv:2407.12824</comment></element-citation></ref>
<ref id="R60"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Swezey</surname><given-names>R.</given-names></name><name><surname>Grover</surname><given-names>A.</given-names></name><name><surname>Charron</surname><given-names>B.</given-names></name><name><surname>Ermon</surname><given-names>S.</given-names></name></person-group> <year>(2021)</year> <article-title>Pirank: Scalable learning to rank via differentiable sorting</article-title><source>Advances in Neural Information Processing Systems</source><volume>34</volume><fpage>21644</fpage><lpage>21654</lpage></element-citation></ref>
<ref id="R61"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Valdivia</surname><given-names>A.</given-names></name><name><surname>S&#x00E1;nchez-Monedero</surname><given-names>J.</given-names></name><name><surname>Casillas</surname><given-names>J.</given-names></name></person-group> <year>(2020)</year> <article-title>How fair can we go in machine learning? assessing the boundaries of fairness in decision trees</article-title><comment>arXiv preprint arXiv:2006.12399</comment></element-citation></ref>
<ref id="R62"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Vidgen</surname><given-names>B.</given-names></name><name><surname>Derczynski</surname><given-names>L.</given-names></name></person-group> <year>(2020)</year> <article-title>Directions in abusive language training data, a systematic review: Garbage in, garbage out</article-title><source>Plos one</source><volume>15</volume><issue>12</issue><fpage>e0243300</fpage></element-citation></ref>
<ref id="R63"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Wei</surname><given-names>S.</given-names></name><name><surname>Niethammer</surname><given-names>M.</given-names></name></person-group> <year>(2020)</year> <article-title>The fairness-accuracy pareto front</article-title><comment>arXiv preprint arXiv:2008.10797</comment></element-citation></ref>
<ref id="R64"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Xia</surname><given-names>M.</given-names></name><name><surname>Field</surname><given-names>A.</given-names></name><name><surname>Tsvetkov</surname><given-names>Y.</given-names></name></person-group> <year>(2020)</year> <article-title>Demoting racial bias in hate speech detection</article-title><source>Proceedings of the Eighth International Workshop on Natural Language Processing for Social Media</source><fpage>7</fpage><lpage>14</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.18653/ v1/2020.socialnlp-1.2">https://doi.org/10.18653/ v1/2020.socialnlp-1.2</ext-link></element-citation></ref>
<ref id="R65"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Yue</surname><given-names>Y.</given-names></name><name><surname>Finley</surname><given-names>T.</given-names></name><name><surname>Radlinski</surname><given-names>F.</given-names></name><name><surname>Joachims</surname><given-names>T.</given-names></name></person-group> <year>(2007)</year> <article-title>A support vector method for optimizing average precision</article-title><source>Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval</source><fpage>271</fpage><lpage>278</lpage></element-citation></ref>
<ref id="R66"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Zampieri</surname><given-names>M.</given-names></name><name><surname>Nakov</surname><given-names>P.</given-names></name><name><surname>Rosenthal</surname><given-names>S.</given-names></name><name><surname>Atanasova</surname><given-names>P.</given-names></name><name><surname>Karadzhov</surname><given-names>G.</given-names></name><name><surname>Mubarak</surname><given-names>H.</given-names></name><name><surname>Derczynski</surname><given-names>L.</given-names></name><name><surname>Pitenis</surname><given-names>Z.</given-names></name><name><surname>&#x00C7;&#x00F6;ltekin</surname><given-names>&#x00C7;.</given-names></name></person-group> <year>(2020)</year> <article-title>SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020)</article-title><source>Proceedings of SemEval</source></element-citation></ref>
<ref id="R67"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Zhao</surname><given-names>H.</given-names></name><name><surname>Coston</surname><given-names>A.</given-names></name><name><surname>Adel</surname><given-names>T.</given-names></name><name><surname>Gordon</surname><given-names>G. J.</given-names></name></person-group> <year>(2020)</year> <article-title>Conditional learning of fair representations</article-title><source>8th International Conference on Learning Representations</source><comment>ICLR 2020</comment></element-citation></ref>
</ref-list>
<app-group>
<app id="app1">
<title>Appendix</title>
<sec id="app_sec1">
<title>Relating BCE and GAP measures</title>
<p>We provide a step-by-step derivation from BCE to GAP measures and analyse how each of the measures are correlated, to highlight their interplay. Before delving into the measures, we setup the notation and classes to illustrate the relation.</p>
</sec>
<sec id="app_sec2">
<title>Binary cross entropy</title>
<p>Binary cross entropy (BCE), as formulated in Eq. 2 is typically used as a loss function for optimizing a classifier. Although not a strict one-to-one correspondence, it is generally observed that minimizing the BCE loss leads to maximization of Accuracy. The BCE formulation does not consider imbalance across class frequency, hence might be biased towards the majority class label. It also does not consider the sensitive attributes.</p>
</sec>
<sec id="app_sec3">
<title>Weighted cross entropy</title>
<p>One way to account for the imbalance across toxic and non-toxic labels (<italic>y</italic>) is weighted cross entropy (WCE), a variation of BCE that re-weights the error for the different classes proportional to their inverse frequency of labels (<italic>y</italic>). This re-weighting strategy is available in popular packages like SkLearn (<xref rid="R49" ref-type="bibr">Pedregosa et al., 2011</xref>) and is discussed in detail by (T.-Y. <xref rid="R36" ref-type="bibr">Lin et al., 2017</xref>). In Eq. 9 we replicate BCE (Eq. 2) terms twice which only introduces a duplication without formulation alteration. To ensure class balancing across toxic and non-toxic classes, we scale each of the duplicate terms <italic>w.r.t.</italic> to the sample count (toxic: <italic>P</italic>, non-toxic: <italic>Q</italic>) of the opposite class, while performing summation. When there&#x2019;s no class imbalance <italic>i.e., P</italic> = <italic>Q</italic>, WCE reduces to 2&#x00B7;BCE, which has the same loss trajectory as BCE. This definition of WCE in Eq. 10 is differentiable, owing to its similar form to BCE and shares all the properties of BCE which allows it to be used as a loss for optimizing binary classifiers. WCE attempts to reduce the bias of the majority label due to the inverse sample count scaling, <italic>i.e.,</italic> majority and minority classes scaled by their opposite sample counts respectively.</p>
<disp-formula><label>(9)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="block"><mml:mrow><mml:mi>B</mml:mi><mml:mi>C</mml:mi><mml:mi>E</mml:mi><mml:mi>&#x221E;</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mfenced><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>N</mml:mi></mml:mfrac><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mi>N</mml:mi></mml:munder><mml:mrow><mml:mi>y</mml:mi><mml:mi>log</mml:mi><mml:mfenced><mml:mover accent='true'><mml:mi>y</mml:mi><mml:mo>&#x005E;</mml:mo></mml:mover></mml:mfenced><mml:mo>+</mml:mo><mml:mfenced><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:mfenced><mml:mi>log</mml:mi><mml:mfenced><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mover accent='true'><mml:mi>y</mml:mi><mml:mo>&#x005E;</mml:mo></mml:mover></mml:mrow></mml:mfenced></mml:mrow></mml:mstyle></mml:mrow></mml:mfenced><mml:mo>&#x2212;</mml:mo><mml:mfenced><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>N</mml:mi></mml:mfrac><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mi>N</mml:mi></mml:munder><mml:mrow><mml:mi>y</mml:mi><mml:mi>log</mml:mi><mml:mfenced><mml:mover accent='true'><mml:mi>y</mml:mi><mml:mo>&#x005E;</mml:mo></mml:mover></mml:mfenced><mml:mo>+</mml:mo><mml:mfenced><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:mfenced><mml:mi>log</mml:mi><mml:mfenced><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mover accent='true'><mml:mi>y</mml:mi><mml:mo>&#x005E;</mml:mo></mml:mover></mml:mrow></mml:mfenced></mml:mrow></mml:mstyle></mml:mrow></mml:mfenced></mml:mrow></mml:math></disp-formula>
<disp-formula><label>(10)</label><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="block"><mml:mrow><mml:mi>W</mml:mi><mml:mi>C</mml:mi><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:munder><mml:munder><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mfrac><mml:mi>Q</mml:mi><mml:mi>N</mml:mi></mml:mfrac><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mi>N</mml:mi></mml:munder><mml:mrow><mml:mi>y</mml:mi><mml:mi>log</mml:mi><mml:mfenced><mml:mover accent='true'><mml:mi>y</mml:mi><mml:mo>&#x005E;</mml:mo></mml:mover></mml:mfenced><mml:mo>+</mml:mo><mml:mfenced><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:mfenced><mml:mi>log</mml:mi><mml:mfenced><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mover accent='true'><mml:mi>y</mml:mi><mml:mo>&#x005E;</mml:mo></mml:mover></mml:mrow></mml:mfenced></mml:mrow></mml:mstyle></mml:mrow><mml:mo stretchy='true'>&#xFE38;</mml:mo></mml:munder><mml:mrow><mml:mtext>BCE</mml:mtext><mml:mtext>&#x2009;</mml:mtext><mml:mtext>loss</mml:mtext><mml:mtext>&#x2009;</mml:mtext><mml:mtext>for</mml:mtext><mml:mtext>&#x2009;</mml:mtext><mml:mtext>toxic</mml:mtext><mml:mtext>&#x2009;</mml:mtext><mml:mtext>class</mml:mtext><mml:mtext>&#x2009;</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mtext>y</mml:mtext><mml:mo>=</mml:mo><mml:mtext>1</mml:mtext><mml:mo stretchy='false'>)</mml:mo><mml:mtext>&#x2009;</mml:mtext><mml:mtext>with</mml:mtext><mml:mtext>&#x2009;</mml:mtext><mml:mtext>scaling</mml:mtext></mml:mrow></mml:munder><mml:mtext>&#x2009;</mml:mtext><mml:munder><mml:munder><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mfrac><mml:mi>P</mml:mi><mml:mi>N</mml:mi></mml:mfrac><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mi>N</mml:mi></mml:munder><mml:mrow><mml:mi>y</mml:mi><mml:mi>log</mml:mi><mml:mfenced><mml:mover accent='true'><mml:mi>y</mml:mi><mml:mo>&#x005E;</mml:mo></mml:mover></mml:mfenced><mml:mo>+</mml:mo><mml:mfenced><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:mfenced><mml:mi>log</mml:mi><mml:mfenced><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mover accent='true'><mml:mi>y</mml:mi><mml:mo>&#x005E;</mml:mo></mml:mover></mml:mrow></mml:mfenced></mml:mrow></mml:mstyle></mml:mrow><mml:mo stretchy='true'>&#xFE38;</mml:mo></mml:munder><mml:mrow><mml:mtext>BCE</mml:mtext><mml:mtext>&#x2009;</mml:mtext><mml:mtext>loss</mml:mtext><mml:mtext>&#x2009;</mml:mtext><mml:mtext>for</mml:mtext><mml:mtext>&#x2009;</mml:mtext><mml:mtext>non-toxic</mml:mtext><mml:mtext>&#x2009;</mml:mtext><mml:mtext>class</mml:mtext><mml:mtext>&#x2009;</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mtext>y</mml:mtext><mml:mo>=</mml:mo><mml:mtext>0</mml:mtext><mml:mo stretchy='false'>)</mml:mo><mml:mtext>&#x2009;</mml:mtext><mml:mtext>with</mml:mtext><mml:mtext>&#x2009;</mml:mtext><mml:mtext>scaling</mml:mtext></mml:mrow></mml:munder></mml:mrow></mml:math></disp-formula>
<p><italic>Remark 1.</italic> Rescaling the majority and minority labels (<italic>y</italic>) with their inverse frequency only ensures reduced bias towards the majority label. It does not optimize for equal accuracy across both the labels.</p>
</sec>
<sec id="app_sec4">
<title>WCE w.r.t sensitive group attribute</title>
<p>While WCE accounts for the label imbalance in the dataset, it still does not consider the notion of fairness and the different sub-populations. The core idea behind WCE is that we can <italic>&#x2018;copy&#x2019;</italic> the loss function twice and then apply mathematical transformations to it, while maintaining the property of differentiability. We apply that same idea to derive our loss function for fairness. We calculate two separate instances of WCE: <italic>WCE</italic> (<italic>g</italic> = 1), calculated for data samples of group 1, and <italic>WCE</italic> (<italic>g</italic> = 0), calculated for data samples of group 0. GAP in essence is the 2-norm difference between the WCE&#x2019;s across each sensitive attribute <italic>s</italic>. The GAP loss function in Eq. 3 obtains a minimum only when both WCE errors match across the binary sensitive attribute. Note that unlike WCE, our measure GAP is defined as the difference between the two loss functions, rather than their weighted sum. Therefore, GAP reaches its minimum when the two sub-populations of sensitive group attribute (<italic>s</italic>) achieve the same accuracy.</p>
</sec>
<sec id="app_sec5">
<title>Datasets</title>
<p>We consider two datasets: Davidson <italic>et al</italic>. (<xref rid="R18" ref-type="bibr">Davidson et al., 2017</xref>) for author demographics and the <italic>Civil Comments</italic> (<xref rid="R9" ref-type="bibr">Borkan et al., 2019</xref>) portion of <italic>Wilds</italic> (<xref rid="R34" ref-type="bibr">Koh et al., 2021</xref>) for target demographics. In each case, we frame the task as a binary classification problem (Toxic <italic>vs.</italic> non-Toxic, or &#x201C;safe&#x201D;) with binary sensitive attributes (Majority <italic>vs.</italic> Minority, the under-represented, sensitive attribute). For Davidson, since an explicit train-test split does not exist, we randomly seed the dataset into train-test splits of 90% - 10%, following SkLearn&#x2019;s (<xref rid="R49" ref-type="bibr">Pedregosa et al., 2011</xref>) stratified sampling to ensure similar proportion of positive and negative tweets across the splits. For Wilds (<xref rid="R34" ref-type="bibr">Koh et al., 2021</xref>) we select tweets where more than 50% of annotators agreed on the gender of the target, and the toxicity label as well. Note that the annotation for male and female in the dataset is carried out separately, so it is possible that a tweet is targeted both towards male and female. We include such tweets in both portions as independent samples. Such pre-processing has been done across both train and test splits for evaluation purposes.</p>
</sec>
<sec id="app_sec6">
<title>Setup</title>
<p>Experiments use a Nvidia 2060 RTX Super 8GB GPU, Intel Core i7-9700F 3.0GHz 8-core CPU and 16GB DDR4 memory. We use the Keras (<xref rid="R14" ref-type="bibr">Chollet, 2015</xref>) library on a Tensorflow 2.0 backend with Python 3.7 to train the networks in this paper. For optimization, we use AdaMax (<xref rid="R33" ref-type="bibr">Kingma &#x0026; Ba, 2014</xref>) with parameters (<italic>lr</italic>=0.001) and 1000 steps per epoch. For each configuration, we did five independent runs to report mean and variance.</p>
</sec>
<sec id="app_sec7">
<title>Runtime</title>
<p>The benefit of any Pareto HyperNetwork is to trace out the approximated front of feasible values during the training time, so that uses can extract neural weights corresponding to their desired trade-off values <italic>a posteriori</italic>. In our experiments, for the five trade-off values shown, one can achieve it in two ways.</p>
<list list-type="order">
<list-item><p>Run the Bert model five times, each with different trade-off in the loss function</p></list-item>
<list-item><p>Run the Bert model one time, with the Pareto HyperNetwork supervising it.</p></list-item>
</list>
<p>The Bert model ran for 10 epochs with &#x223C; 10 mins per epoch, for a total runtime &#x223C; 100 mins. If we run the same configuration for five trade-offs, that would equate to &#x223C; 500 mins of runtime. Thus, any additional trade-off measures the user desires would cost an extra &#x223C; 100 mins each. The SUHNPF Pareto HyperNetwork on the other hand approximated a manifold of trade-off values supervising the Bert model, where the Bert model still takes &#x223C; 100 mins with the supervising network taking additional &#x223C; 60 mins for manifold approximation. Extracting the weights of the Bert model post-hoc takes an additional &#x223C; 20 mins for each trade-off. Therefore, while both the prescribed approaches would roughly yield similar results from optimization of the Bert model, Approach 1 would take &#x223C; 500 mins, while Approach 2 would take &#x223C; 260 mins, resulting in a &#x223C; 2&#x00D7; speedup via PFL.</p>
</sec>
<sec id="app_sec8">
<title>Discussion on metric divergence</title>
<p><xref ref-type="table" rid="T4">Table 4</xref> reports the Accuracy Difference (AD) and Overall Accuracy (OA) values achieved for the different trade-off configurations of the Bert model, across three loss measures. This is a tabulated version of <xref ref-type="fig" rid="F1">Fig. 1</xref> (main text). Note that for trade-off <italic>&#x03B1;</italic> = 1, only OA is maximized, hence none of the losses play any part, thus a common number across three columns, for each dataset. As the trade-off takes into account each of the loss measures, we empirically observe GAP to be performing best <italic>w.r.t.</italic> the other measures, since it is being optimized <italic>w.r.t.</italic> minimizing AD.</p>
<table-wrap id="T4">
<label>Table 4.</label>
<caption><p>Performance of GAP <italic>vs.</italic> CLA, ADV across two datasets in terms of Accuracy Difference (AD) and Overall Accuracy (OA). GAP achieves lower AD consistently across <italic>&#x03B1;</italic> settings and datasets, while a more modest drop in OA is observed across methods. <italic>&#x03B1;</italic> = 1 minimizes WCE over labels only, hence same error across the three measures.</p></caption>
<table>
<thead>
<tr>
<th align="center" valign="top">&#x03B1;</th>
<th align="center" valign="top" colspan="3">Accuracy Difference</th>
<th align="center" valign="top" colspan="3">Overall Accuracy</th>
<th align="center" valign="top" colspan="3">F1</th>
</tr>
<tr>
<th align="center" valign="top"></th>
<th align="center" valign="top">GAP (Ours)</th>
<th align="center" valign="top">CLA</th>
<th align="center" valign="top">ADV</th>
<th align="center" valign="top">GAP (Ours)</th>
<th align="center" valign="top">CLA</th>
<th align="center" valign="top">ADV</th>
<th align="center" valign="top">GAP (Ours)</th>
<th align="center" valign="top">CLA</th>
<th align="center" valign="top">ADV</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top" colspan="10">Davidson</td>
</tr>
<tr>
<td align="center" valign="top">1.00</td>
<td align="center" valign="top" colspan="3">5.9 &#x00B1; 0.1</td>
<td align="center" valign="top" colspan="3">88.9 &#x00B1; 0.2</td>
<td align="center" valign="top" colspan="3">0.71 &#x00B1; 0.02</td>
</tr>
<tr>
<td align="center" valign="top">0.75</td>
<td align="center" valign="top">4.2 &#x00B1; 0.1</td>
<td align="center" valign="top">5.0 &#x00B1; 0.1</td>
<td align="center" valign="top">4.7 &#x00B1; 0.1</td>
<td align="center" valign="top">88.5 &#x00B1; 0.3</td>
<td align="center" valign="top">88.6 &#x00B1; 0.2</td>
<td align="center" valign="top">88.2 &#x00B1; 0.4</td>
<td align="center" valign="top">0.70 &#x00B1; 0.01</td>
<td align="center" valign="top">0.69 &#x00B1; 0.01</td>
<td align="center" valign="top">0.68 &#x00B1; 0.00</td>
</tr>
<tr>
<td align="center" valign="top">0.50</td>
<td align="center" valign="top">2.7 &#x00B1; 0.1</td>
<td align="center" valign="top">3.7 &#x00B1; 0.1</td>
<td align="center" valign="top">3.2 &#x00B1; 0.1</td>
<td align="center" valign="top">88.1 &#x00B1; 0.5</td>
<td align="center" valign="top">88.3 &#x00B1; 0.5</td>
<td align="center" valign="top">87.4 &#x00B1; 0.6</td>
<td align="center" valign="top">0.69 &#x00B1; 0.02</td>
<td align="center" valign="top">0.67 &#x00B1; 0.01</td>
<td align="center" valign="top">0.65 &#x00B1; 0.01</td>
</tr>
<tr>
<td align="center" valign="top">0.25</td>
<td align="center" valign="top">1.2 &#x00B1; 0.1</td>
<td align="center" valign="top">2.4 &#x00B1; 0.0</td>
<td align="center" valign="top">2.7 &#x00B1; 0.1</td>
<td align="center" valign="top">87.7 &#x00B1; 0.2</td>
<td align="center" valign="top">87.9 &#x00B1; 0.4</td>
<td align="center" valign="top">86.8 &#x00B1; 0.6</td>
<td align="center" valign="top">0.67 &#x00B1; 0.01</td>
<td align="center" valign="top">0.65 &#x00B1; 0.00</td>
<td align="center" valign="top">0.64 &#x00B1; 0.01</td>
</tr>
<tr>
<td align="center" valign="top">0.00</td>
<td align="center" valign="top">0.1 &#x00B1; 0.0</td>
<td align="center" valign="top">0.9 &#x00B1; 0.0</td>
<td align="center" valign="top">2.4 &#x00B1; 0.1</td>
<td align="center" valign="top">87.3 &#x00B1; 0.1</td>
<td align="center" valign="top">87.6 &#x00B1; 0.2</td>
<td align="center" valign="top">86.3 &#x00B1; 0.4</td>
<td align="center" valign="top">0.66 &#x00B1; 0.00</td>
<td align="center" valign="top">0.64 &#x00B1; 0.02</td>
<td align="center" valign="top">0.61 &#x00B1; 0.01</td>
</tr>
<tr>
<td align="center" valign="top" colspan="10">Wilds</td>
</tr>
<tr>
<td align="center" valign="top">1.00</td>
<td align="center" valign="top" colspan="3">3.9 &#x00B1; 0.2</td>
<td align="center" valign="top" colspan="3">84.7 &#x00B1; 0.3</td>
<td align="center" valign="top" colspan="3">0.65 &#x00B1; 0.02</td>
</tr>
<tr>
<td align="center" valign="top">0.75</td>
<td align="center" valign="top">3.3 &#x00B1; 0.1</td>
<td align="center" valign="top">3.6 &#x00B1; 0.1</td>
<td align="center" valign="top">3.5 &#x00B1; 0.1</td>
<td align="center" valign="top">84.6 &#x00B1; 0.2</td>
<td align="center" valign="top">84.6 &#x00B1; 0.1</td>
<td align="center" valign="top">84.5 &#x00B1; 0.3</td>
<td align="center" valign="top">0.63 &#x00B1; 0.02</td>
<td align="center" valign="top">0.62 &#x00B1; 0.01</td>
<td align="center" valign="top">0.62 &#x00B1; 0.02</td>
</tr>
<tr>
<td align="center" valign="top">0.50</td>
<td align="center" valign="top">2.6 &#x00B1; 0.1</td>
<td align="center" valign="top">3.1 &#x00B1; 0.1</td>
<td align="center" valign="top">2.9 &#x00B1; 0.1</td>
<td align="center" valign="top">84.5 &#x00B1; 0.4</td>
<td align="center" valign="top">84.6 &#x00B1; 0.6</td>
<td align="center" valign="top">83.9 &#x00B1; 0.4</td>
<td align="center" valign="top">0.62 &#x00B1; 0.0</td>
<td align="center" valign="top">0.61 &#x00B1; 0.01</td>
<td align="center" valign="top">0.60 &#x00B1; 0.01</td>
</tr>
<tr>
<td align="center" valign="top">0.25</td>
<td align="center" valign="top">1.5 &#x00B1; 0.0</td>
<td align="center" valign="top">2.5 &#x00B1; 0.0</td>
<td align="center" valign="top">2.0 &#x00B1; 0.1</td>
<td align="center" valign="top">84.5 &#x00B1; 0.1</td>
<td align="center" valign="top">84.5 &#x00B1; 0.2</td>
<td align="center" valign="top">83.8 &#x00B1; 0.5</td>
<td align="center" valign="top">0.60 &#x00B1; 0.01</td>
<td align="center" valign="top">0.60 &#x00B1; 0.01</td>
<td align="center" valign="top">0.57 &#x00B1; 0.01</td>
</tr>
<tr>
<td align="center" valign="top">0.00</td>
<td align="center" valign="top">0.3 &#x00B1; 0.0</td>
<td align="center" valign="top">1.8 &#x00B1; 0.1</td>
<td align="center" valign="top">1.3 &#x00B1; 0.1</td>
<td align="center" valign="top">84.4 &#x00B1; 0.1</td>
<td align="center" valign="top">84.4 &#x00B1; 0.1</td>
<td align="center" valign="top">83.6 &#x00B1; 0.2</td>
<td align="center" valign="top">0.58 &#x00B1; 0.02</td>
<td align="center" valign="top">0.58 &#x00B1; 0.01</td>
<td align="center" valign="top">0.55 &#x00B1; 0.02</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="app_sec9">
<title>Performance of models on Wilds dataset</title>
<p><xref ref-type="table" rid="T5">Table 5</xref> shows the baseline results on the Wilds (<xref rid="R34" ref-type="bibr">Koh et al., 2021</xref>) dataset. The performance of the classifiers is similar <italic>w.r.t.</italic> <xref ref-type="table" rid="T2">Table 2</xref>, where due to focus on Overall Accuracy (OA), there is a gap between the group specific accuracies. This shows the existing bias across the three neural models, with the BERT based model performing relatively better than the rest.</p>
<table-wrap id="T5">
<label>Table 5.</label>
<caption><p>Baseline accuracy results on Wilds (<xref rid="R34" ref-type="bibr">Koh et al., 2021</xref>) dataset when maximizing overall accuracy (OA) only. Results show consistent bias of higher accuracy for the Majority.</p></caption>
<table>
<thead>
<tr>
<th align="center" valign="top">Models</th>
<th align="center" valign="top">Overall %</th>
<th align="center" valign="top">Majority %</th>
<th align="center" valign="top">Minority %</th>
<th align="center" valign="top">AD%</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top">CNN</td>
<td align="center" valign="top">83.90 &#x00B1; 0.2</td>
<td align="center" valign="top">86.11 &#x00B1; 0.1</td>
<td align="center" valign="top">81.27 &#x00B1; 0.2</td>
<td align="center" valign="top">4.84 &#x00B1; 0.2</td>
</tr>
<tr>
<td align="center" valign="top">BiLSTM</td>
<td align="center" valign="top">83.94 &#x00B1; 0.1</td>
<td align="center" valign="top">85.98 &#x00B1; 0.2</td>
<td align="center" valign="top">81.52 &#x00B1; 0.2</td>
<td align="center" valign="top">4.46 &#x00B1; 0.1</td>
</tr>
<tr>
<td align="center" valign="top">BERT</td>
<td align="center" valign="top">84.71 &#x00B1; 0.3</td>
<td align="center" valign="top">86.53 i0.1</td>
<td align="center" valign="top">82.49 &#x00B1; 0.2</td>
<td align="center" valign="top">4.04 &#x00B1; 0.2</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</app>
</app-group>
</back>
</article>