An analysis of poet demographic and thematic diversity in a poetry collection for inclusive AI

Authors

  • Kahyun Choi University of Illinois Urbana-Champaign
  • Gyuri Kang Indiana University Bloomington

DOI:

https://doi.org/10.47989/ir30iConf47263

Keywords:

Digital library, responsible AI, natural language processing, poetry, dataset evaluation

Abstract

Introduction. AI technologies, such as theme classification and named entity recognition, enhance digital library accessibility. However, they may introduce biases if training datasets lack adequate representation. For instance, prior AI models for poetry classification overlooked dataset diversity, raising concerns about representation. To address this issue, this study assesses the dataset representation and examines potential issues in AI model design for poetry collections.

Method. We annotated and published the race and ethnicity of poets in an American poetry collection curated by poets.org, which was recently used to train a poetry theme classification system. We then examined the diversity of the collection using these annotations.

Analysis. We compared the racial/ethnic composition of the collection to U.S. Census data and conducted group-exclusive top word analysis, popular theme analysis, and entropy-based analysis of theme distribution diversity to evaluate linguistic and thematic diversity.

Results. Our findings indicate that most underrepresented groups are well-represented in the collection, except for Latino/a/x American poets. Furthermore, we found that poems from underrepresented groups increase the collection’s linguistic and thematic diversity.

Conclusions. To design responsible AI that embraces diversity, it is essential to assess dataset representation and support non-standard English and diverse themes beyond those popular with the general population.

Published

2025-03-11

How to Cite

Choi, K., & Kang, G. (2025). An analysis of poet demographic and thematic diversity in a poetry collection for inclusive AI. Information Research an International Electronic Journal, 30(iConf), 610–617. https://doi.org/10.47989/ir30iConf47263

Issue

Section

Peer-reviewed papers

Similar Articles

1 2 3 4 5 6 7 8 9 10 > >> 

You may also start an advanced similarity search for this article.