Safe but scripted: uncovering second-order bias in LLM LGBTQ+ storytelling
DOI:
https://doi.org/10.47989/ir31iConf64191Keywords:
Large language models, Second-order bias, Queer representation, Trustworthy AI, Narrative equityAbstract
Introduction. This study investigates how large language models (LLMs) narrate everyday LGBTQ+ romance.
Method. Using a matrix of 10 prompts with parallel versions for male-male (M-M), female-female (F-F), and male-female (M-F) couples, we generated 120 textual outputs from four leading LLMs (GPT-5, Claude 4 Sonnet, Llama 4, and Deepseek-V3).
Analysis. We performed a qualitative analysis on the corpus, first auditing for overt biases (e.g., relationship downgrading) before using comparative thematic analysis to identify the stereotypical narrative scripts systematically applied to each couple pairing.
Results. No overt bias was found, establishing a baseline of formal equality. However, we discovered a significant second-order bias: models consistently deploy three distinct scripts. Female-female couples are cast in an ‘Emotional Haven’ trope, male-male couples as ‘Future Builders,’ and male-female pairs follow a traditional ‘Default Template.’
Conclusion(s). LLMs currently achieve safety at the cost of authenticity, resulting in a ‘benevolent stereotyping’ of LGBTQ+ representation. This finding reveals the limits of current evaluation paradigms, which audit for overt harm but fail to measure narrative equity, as sanitised tropes replace diverse realities.
References
Anthropic. (2025). Claude Sonnet 4. https://www.anthropic.com/claude/sonnet
Braun, V., & Clarke, V. (2021). Thematic analysis: A practical guide. https://www.torrossa.com/it/resources/an/5282292
DeepSeek-V3. (2025). Github. https://github.com/deepseek-ai/DeepSeek-V3
Dhamala, J., Sun, T., Kumar, V., Krishna, S., Pruksachatkun, Y., Chang, K.-W., & Gupta, R. (2021, March 3). BOLD: Dataset and metrics for measuring biases in open-ended language generation. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. FAccT ‘21: 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event Canada. https://doi.org/10.1145/3442188.3445924
Dhingra, H., Jayashanker, P., Moghe, S., & Strubell, E. (2023). Queer people are people first: Deconstructing sexual identity stereotypes in large Language Models. In arXiv [cs.CL]. arXiv. http://arxiv.org/abs/2307.00101
Drisko, J. W., & Maschi, T. (2015). Content Analysis. Oxford University Press.
Ekvitthayavechnukul, C. (2025, January 23). Thai LGBTQ+ couples register marriages as law gives them equal status. AP News. https://apnews.com/article/thailand-lgbtq-marriage-law-c2162f07cd103ea86631a137d6b6d193
Erlingsson, C., & Brysiewicz, P. (2017). A hands-on guide to doing content analysis. African Journal of Emergency Medicine: Revue Africaine de La Medecine d’urgence, 7(3), 93–99.
Felkner, V. K., Chang, H.-C. H., Jang, E., & May, J. (2023). WinoQueer: A community-in-the-loop benchmark for anti-LGBTQ+ bias in large language models. In arXiv [cs.CL]. arXiv. http://arxiv.org/abs/2306.15087
Gehman, S., Gururangan, S., Sap, M., Choi, Y., & Smith, N. A. (2020). RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In arXiv [cs.CL]. arXiv. http://arxiv.org/abs/2009.11462
Ghosal, A., Gupta, A., & Srikumar, V. (2025). Unequal voices: How LLMs construct constrained queer narratives. In arXiv [cs.CY]. arXiv. http://arxiv.org/abs/2507.15585
Kotek, H., Sun, D. Q., Xiu, Z., Bowler, M., & Klein, C. (2024). Protected group bias and stereotypes in Large Language Models. In arXiv [cs.CY]. arXiv. http://arxiv.org/abs/2403.14727
Kumar, S. H., Sahay, S., Mazumder, S., Okur, E., Manuvinakurike, R., Beckage, N., Su, H., Lee, H.-Y., & Nachman, L. (2024). Decoding biases: Automated methods and LLM judges for gender bias detection in Language Models. In arXiv [cs.CL]. arXiv. http://arxiv.org/abs/2408.03907
Meta. (2025). Llama 4. https://www.llama.com/models/llama-4/
OpenAI. (2025). OpenAI. https://openai.com/
Rauh, M., Mellor, J. F. J., Uesato, J., Huang, P.-S., Welbl, J., Weidinger, L., Dathathri, S., Glaese, A., Irving, G., Gabriel, I., Isaac, W. S., & Hendricks, L. A. (2022). Characteristics of harmful text: Towards rigorous benchmarking of language models. Neural Information Processing Systems, abs/2206.08325, 24720–24739.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Huiqian Lai

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
