‘Appears to be about’: an evaluation of AI-generated metadata quality for community archives

Nikki Wise; Katrina Fenlon; Diana Marsh; Amanda Sorensen; Ugoma Smoke; Candy Navarette; Lucy Havens

doi:10.47989/ir31iConf64144

Authors

Nikki Wise University of Maryland
Katrina Fenlon University of Maryland
Diana Marsh University of Maryland
Amanda Sorensen University of Maryland
Ugoma Smoke University of Maryland
Candy Navarette University of Maryland
Lucy Havens University of Maryland

DOI:

https://doi.org/10.47989/ir31iConf64144

Keywords:

Cultural heritage, Community archives, Linked open data, Artificial intelligence

Abstract

Introduction. We report on an evaluation of the quality of metadata generated by a general purpose chatbot using items from a community organisation archive.

Method. We developed an evaluation framework adapting quality dimensions from prior work and applied it to analyse a sample of 140 Dublin Core metadata records created by ChatGPT 4o from primary sources drawn from a community organisation collection, based on informal prompts.

Analysis. Using independent qualitative coders and a peer review process, we assessed accuracy, conformance, consistency, completeness, objectiveness, transparency, bias, engagement, meaning and context, understandability, and provenance.

Results. We found approximately 70% of elements to be accurate. Most records were substantially complete and objective but often vague. Records exhibited significant inconsistencies in how ChatGPT completed fields, conformed to the Dublin Core schema, and interpreted primary sources.

Conclusion(s). General purpose AI chatbots have the capacity to provide substantial ‘rough draft’ descriptive records for community collections, even with minimal prompting. These records require significant human intervention to ensure quality in terms of completeness, conformance to schema, accuracy, and meaningfulness to users. We offer insights for organisations and communities working with AI chatbots for description, along with implications for broader archival practice.

References

4-H. (n.d.). National 4-H Council. Retrieved September 11, 2025, from https://4-h.org/ (Archive Link)

4-H History Preservation Program. (n.d.). National 4-History Preservation Program. Retrieved August 7, 2025, from https://4-hhistorypreservation.com/ (Archive Link)

Aljalahmah, S., & Zavalina, O. L. (2024). Student-Created Dublin Core Metadata Representing Arabic Language eBooks: Comparison of Individual and Group Work Outcomes. Journal of Education for Library and Information Science, 65(3), 325–344. https://doi.org/10.3138/jelis-2023-0016

Berger, T. (2024, October 23). Can You Try Again?: Using Large Language Models to Generate Alt Text for Online Image Collections. The Virtual 2024 DLF Forum. https://osf.io/pc4rx/

Birhane, A., Kasirzadeh, A., Leslie, D., & Wachter, A. (2023) Science in the Age of Large Language Models. Nature Reviews Physics, 5(5), 277-280. https://doi.org/10.1038/s42254-023-00581-4

Brador, I. (2024, November 19). Could Artificial Intelligence Help Catalog Thousands of Digital Library

Breeding, M. (2023). AI: Potential Benefits and Concerns for Libraries. Computers in Libraries, 43(4), 17–19.

Bruce, T. R., & Hillmann, D. I. (2004). The Continuum of metadata quality: Defining, expressing, exploiting. ALA Editions.

Brzustowicz, R. (2023). From ChatGPT to CatGPT: The Implications of Artificial Intelligence on Library Cataloguing. Information Technology and Libraries, 42(3). https://doi.org/10.5860/ital.v42i3.16295

Bucciferro, A. (2008). Attacking the Backlog: NARA Archivists Mobilise to Make Unprocessed Records Available to the Public. Prologue Magazine, 40(2), 46-51.

Carter, K. S., Gondek, A., Underwood, W., Randby, T., & Marciano, R. (2022). Using AI and ML to optimise information discovery in under-utilised, Holocaust-related records. AI & SOCIETY, 37(3), 837–858. https://doi.org/10.1007/s00146-021-01368-w

Chow, E. H. C., Kao, T. J., & Li, X. (2024). An Experiment with the Use of ChatGPT for LCSH Subject Assignment on Electronic Theses and Dissertations. Cataloguing & Classification Quarterly, 62(5), 574–588. https://doi.org/10.1080/01639374.2024.2394516

Chun, W. H. K. (2004). On Software, or the Persistence of Visual Knowledge. Grey Room, 18, 26–51.

Ciecko, B. (2020). AI sees what? The good, the bad, and the ugly of machine vision for museum collections. The Museum Review, 5(1). https://static1.squarespace.com/static/578a4d33e4fcb586152bc72d/t/5ea76766c971ba41c7ed4403/1588029296143/TMR_vol5no1_Ceicko.pdf

Crawford, K. (2024). Generative AI’s environmental costs are soaring - and mostly secret. Nature, 626, 693. https://doi.org/10.1038/d41586-024-00478-x

DCMI Metadata Terms. (2020). DCMI. https://www.dublincore.org/specifications/dublin-core/dcmi-terms/ (Archive Link)

Dilmegani, C. (2025, July 9). Data Quality in AI: Challenges, Importance & Best Practices. AIMultiple. https://research.aimultiple.com/data-quality-ai/ (Archive Link)

Ex Libris. (n.d.). The AI metadata assistant in the metadata editor. Ex Libris Knowledge Center. https://knowledge.exlibrisgroup.com/Alma/Product_Documentation/010Alma_Online_Help_(English)/Metadata_Management/005Introduction_to_Metadata_Management/The_AI_Metadata_Assistant_in_the_Metadata_Editor# (Archive Link)

Fenlon, K., Havens, L., Marsh, D. E., Wise, N., Smoke, U., Navarrete, C., Sioui, J., Mantle, D., & Sorensen, A. (2025). Linked data workflows for community collections: Experiments with open access AI. 88th Annual Meeting for the Association of Information Science and Technology Conference Proceedings, 62. https://doi.org/10.1002/pra2.1246

Fisher, S. A. (2024). Large language models and their big bullshit potential. Ethics and Information Technology, 26(4), 67. https://doi.org/10.1007/s10676-024-09802-5

Foka, A., Griffin, G., Ortiz Pablo, D., Rajkowska, P., & Badri, S. (2025). Tracing the bias loop: AI, cultural heritage, and bias-mitigating in practice. AI & Society. https://www.doi.org/10.1007/s00146-025-02349-z

Gavrilis, D., Makri, D.-N., Papachristopoulos, L., Angelis, S., Kravvaritis, K., Papatheodorou, C., & Constantopoulos, P. (2015). Measuring Quality in Metadata Repositories. In S. Kapidakis, C. Mazurek, & M. Werla (Eds.), Research and Advanced Technology for Digital Libraries (pp. 56–67). Springer International Publishing. https://doi.org/10.1007/978-3-319-24592-8_5

Greenberg, J. (2003). Metadata Generation: Processes, People and Tools. Bulletin of the American Society for Information Science and Technology, 29(2), 16–19. https://doi.org/10.1002/bult.269

Greene, M. A., & Meissner, D. (2005). More Product, Less Process: Revamping Traditional Archival Processing. The American Archivist, 68(2), 208-263. (Archive Link)

Havens, L., Bach, B., Terras, M., & Alex, B. (2025). Investigating the Capabilities and Limitations of Machine Learning for Identifying Bias in English Language Data with Information and Heritage Professionals. Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, 1–22. https://doi.org/10.1145/3706598.3713217

Hicks, M. T., Humphries, J., & Slater, J. (2024). ChatGPT is bullshit. Ethics and Information Technology, 26(2), 38. https://doi.org/10.1007/s10676-024-09775-5

Hosseini, K., Wilson, D. C., Beelen, K., & McDonough, K. (2022). MapReader: a computer vision pipeline for the semantic exploration of maps at scale. Paper presented at the Proceedings of the 6th ACM SIGSPATIAL International Workshop on Geospatial Humanities.

Huang, J., Provo, A. A., McKeehan, M., & Wittmann, R. (2016). Inclusive Metadata Toolkit. Digital Library Federation. https://osf.io/2nmpc/

Hutchinson, B., Rostamzadeh, N., Greer, C., Heller, K., & Prabhakaran, V. (2022). Evaluation Gaps in Machine Learning Practice. 2022 ACM Conference on Fairness Accountability and Transparency, 1859–1876. https://doi.org/10.1145/3531146.3533233

Implementing Australia’s AI Ethics Principles in Government. (2024). Australian Government Department of Finance. https://www.finance.gov.au/government/public-data/data-and-digital-ministers-meeting/national-framework-assurance-artificial-intelligence-government/implementing-australias-ai-ethics-principles-government (Archive Link)

Jaillant, L., & Aske, K. (2024). AI and medical images: Addressing ethical challenges to provide responsible access to historical medical illustrations. Digital Humanities Quarterly, 18(3). https://dhq.digitalhumanities.org/vol/18/2/000755/000755.html (Archive Link)

Jaillant, L., Mitchell, O., Ewoh-Opu, E., & Urbaneja, M.H. (2025). How can we improve the diversity of archival collections with AI? Opportunities, risks, and solutions. AI & Society, 40, 4447–4459. https://doi.org/10.1007/s00146-025-02222-z

JSTOR (n.d.) JSTOR Digital Stewardship Services. https://about.jstor.org/get-jstor/digital-stewardship/ (Archive Link)

Kalai, A. T., Nachum, O., Vempala, S. S., & Zhang, E. (2025). Why Language Models Hallucinate (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2509.04664

Király, P. (2018, November 28). Metadata quality in cultural heritage institutions. Workshop on FAIR Principles for Digital Research Data Management.

Kneese, T. (2023, August 2). Climate Justice & Labor Rights. http://dx.doi.org/10.2139/ssrn.4533853

Kneese, T. (2024, February 12). Measuring AI’s environmental impact requires empirical research and standards. TechPolicy.Press. https://www.techpolicy.press/measuring-ais-environmental-impacts-requires-empirical-research-and-standards/ (Archive Link)

Kneese, T., & Young, M. (2024). Carbon emissions in the tailpipe of generative AI. Harvard Data Science Review. https://doi.org/10.1162/99608f92.fbdf6128

Kugler, L. (2025). How Do You Measure AI? Communications of the ACM, 68(4), 15–17. https://doi.org/10.1145/3708972

Long, D. X., Dinh, D., Nguyen, N.-H., Kawaguchi, K., Chen, N. F., Joty, S., & Kan, M.-Y. (2025). What Makes a Good Natural Language Prompt? Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 5835–5873. https://doi.org/10.18653/v1/2025.acl-long.292

Magnus, B., Priem, M., Vanderperren, N., Berghe, P. V., Keer, E. V., & Vissers, R. (2024). Metadata creation and enrichment using artificial intelligence at meemoo. Journal of Digital Media Management, 13(2), 110–123. https://doi.org/10.69554/NGFF5280

Männistö, A., Seker, M., Iosifidis, A., & Raitoharju, J. (2022). Automatic Image Content Extraction: Operationalising Machine Learning in Humanistic Photographic Studies of Large Visual Archives. arXiv. https://doi.org/10.48550/ARXIV.2204.02149

Margaritopoulos, M., Margaritopoulos, T., Mavridis, I., & Manitsaris, A. (2012). Quantifying and measuring metadata completeness. Journal of the American Society for Information Science and Technology, 63(4), 724–737. https://doi.org/10.1002/asi.21706

Marinescu, M.-C., Reshetnikov, A., & López, J. M. (2020). Improving object detection in paintings based on time contexts. 2020 International Conference on Data Mining Workshops (ICDMW), 926–932. https://doi.org/10.1109/ICDMW51313.2020.00133

Marsh, D. E. (2019). Research-Driven Approaches to Improving Archival Discovery. IASSIST Quarterly 43(2), 1–9. https://doi.org/https://doi.org/10.29173/iq955.

Marvin, G., Hellen, N., Jjingo, D., Nakatumba-Nabende, J. (2024). Prompt Engineering in Large Language Models. In Jacob, I.J., Piramuthu, S., Falkowski-Gilski, P. (eds), Data Intelligence and Cognitive Informatics. ICDICI 2023. Algorithms for Intelligent Systems, pp. 387-402. Springer. https://doi.org/10.1007/978-981-99-7962-2_30

Meaker, M. (2023, September 11). These prisoners are training AI. Wired. https://www.wired.com/story/prisoners-training-ai-finland/(Archive Link)

Media Types. (2025, September 2). Internet Assigned Numbers Authority. https://www.iana.org/assignments/media-types/media-types.xhtml (Archive Link)

Metadata Assessment Framework and Guidance. (n.d.). DLF Metadata Assessment Working Group. https://dlfmetadataassessment.github.io/projects/framework/ (Archive Link)

Metadata Quality. (n.d.). Data Europa. https://data.europa.eu/mqa/methodology?locale=en (Archive Link)

Metadata Schema Assessment Framework. (2024). ALA Core Metadata Standards Committee. https://hdl.handle.net/11213/22781

Mollema, W.J.T. (2024). ‘AI colonialism’ is a conceptual metaphor. [Masters thesis, Utrecht University]. Utrecht University Student Theses Repository. https://studenttheses.uu.nl/handle/20.500.12932/47214?show=full

Nockels, J., Gooding, P., Ames, S., & Terras, M. (2022). Understanding the application of handwritten text recognition technology in heritage contexts: a systematic review of Transkribus in published research. Archival Science, 22(3), 367-392.

One-to-One Principle. (2011, May 1). DCMI. https://www.dublincore.org/resources/glossary/one-to-one_principle/ (Archive Link)

OpenAI. (2022, November 30). Introducing ChatGPT. OpenAI. https://openai.com/index/chatgpt/ (Archive Link)

Osti, G., & Roke E. R. (2024). Collaborating for Change? Assessing Metadata Inclusivity in Digital Collections with Large Language Models (LLMs). 2024 IEEE International Conference on Big Data (BigData), 2479-2488. Washington, DC. https://doi.org/10.1109/BigData62323.2024.10825858.

Panitch, J. M. (2001). Special Collections in ARL Libraries: Results of the 1998 survey sponsored by the ARL Research Collections Committee. Association of Research Libraries.

Pepper, J., Jones, E., Zhao, X., Furst, J., Langlois, K., Uribe-Romo, F., Breen, D., & Greenberg, J. (2024). AI-Ready Data: Knowledge Extraction from Archival Lab Notebooks. 2024 IEEE International Conference on Big Data (BigData), 2489–2495. https://doi.org/10.1109/BigData62323.2024.10825206

Prud’homme, P. A., & Compton, J. (2020). A Research Study of Inventory Practices in Archives in the United States: Scalability and Process. Society of American Archivists Research Forum, 1-12. https://www2.archivists.org/sites/all/files/Inventory%20Practices%20in%20Archives%20FINAL.pdf

Raji, I. D., Bender, E. M., Paullada, A., Denton, E., & Hanna, A. (n.d.). AI and the Everything in the Whole Wide World Benchmark.

Ray, A., Tirrell, J., & Sayers, A. (2025). From Assimilation to Autonomy: Rethinking Data Sovereignty in the Age of Large Language Models. Technical Communication Quarterly, 34(3), 353–372. https://doi.org/10.1080/10572252.2025.2490503

Report and Recommendations from the Task Force on Metadata Quality. (2013). https://pro.europeana.eu/files/Europeana_Professional/Publications/Metadata%20Quality%20Report.pdf?__cf_chl_tk=m0ZH3G7aVKKnPtAIdNE5EeLIzumIEVgs4GjY3VBNzJE-1745323769-1.0.1.1-h7uHrd2oK5A1wWZq6XY9ZpsXDGcLfW5bO.tAMDpDHxo#page=3.34

Roke, E. (2025). Metadata Remediation through AI Collaboration. Paper presented at the SAA Research Forum, Online. https://www2.archivists.org/sites/all/files/2.1.4-Roke.pdf

Rotman, D. (2025, May 20). AI could keep us dependent on natural gas for decades to come. MIT Technology Review, Climate Change and Energy Series.

Schema.org. (n.d.). Schema.org. https://schema.org/ (Archive Link)

Schwabe, D., Becker, K., Seyferth, M., Klaß, A., & Schaeffter, T. (2024). The METRIC-framework for assessing data quality for trustworthy AI in medicine: a systematic review. npj Digital Medicine, 7(1), 203. doi:10.1038/s41746-024-01196-4

Society for American Archivists. (n.d.). C.F.W. Coker award: JSTOR Seeklight. https://www2.archivists.org/recipients/2025/cfw-coker-award-jstor-seeklight (Archive Link)

Steyvers, M., Tejeda, H., Kumar, A., Belem, C., Karny, S., Hu, X., Mayer, L.W., Smyth, P. (2025). What large language models know and what people think they know. Nature Machine Intelligence 7, 221–231. https://doi.org/10.1038/s42256-024-00976-7

Stvilia, B., & Gasser, L. (2008). Value-based metadata quality assessment. Library & Information Science Research, 30(1), 67–74. https://doi.org/10.1016/j.lisr.2007.06.006

Sun, Z., Yan, Y., & Zeng, Y. (2025). How to get enriched metadata? A multi-model model fusion strategy for automatic metadata enhancement in GLAM art collections. 88th Annual Meeting for the Association of Information Science and Technology Conference Proceedings, 62.

Sundararasan, T. (2024). Data sovereignty: Indigenous ownership in the age of AI. In Artificial Intelligence in Education Editors (pp. 151-166). Mithra Publication Tamil Nadu. https://doi.org/10.1037/1528-3542.4.3.507

Suominen, O. (2019). Annif: DIY automated subject indexing using multiple algorithms. LIBER Quarterly: The Journal of the Association of European Research Libraries, 29(1), 1–25. https://doi.org/10.18352/lq.10285

Taniguchi, S. (2024). Creating and Evaluating MARC 21 Bibliographic Records Using ChatGPT. Cataloging & Classification Quarterly, 62(5), 527–546. https://doi.org/10.1080/01639374.2024.2394513

Temple, J. (2025, May 20). The data centre boom in the desert. MIT Technology Review, Climate Change and Energy Series.

Walter, M., & Russo Carroll, S. (2020). Indigenous Data Sovereignty, governance, and the link to Indigenous policy. In Indigenous Data Sovereignty and Policy, pp. 1–20. Routledge. https://library.oapen.org/handle/20.500.12657/42782

Weissner, M. (2024). Ready, set, scan: National Archives to digitise 500M records by 2026. Federal Times. https://www.federaltimes.com/it-networks/ai/2024/04/18/ready-set-scan-national-archives-to-digitise-500m-records-by-2026/ (Archive Link)

Wen, S. (2014, November 11). The Ladies Vanish. The New Inquiry, Essays, and Reviews. https://thenewinquiry.com/the-ladies-vanish/ (Archive Link)

Widder, D.G., & Kneese, T. (2025). Salvage anthropology and low-resource NLP: what computer science should learn from the social sciences. Interactions, 32(2), 46–49. https://doi.org/10.1145/3714996.

Xiong, M., Hu, Z., Lu, X., Li, Y., Fu, J., He, J., & Hooi, B. (2024, March 17). Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs. ICLR 2024. https://doi.org/10.48550/arXiv.2306.13063

Yu, L., Charlton, A., Terras, M., & Filgueira, R. (2024). Advancing frances: New Heritage Textual Ontology, Enhanced Knowledge Graphs, and Refined Search Capabilities. 2024 IEEE 20th International Conference on E-Science (e-Science), 1–10. https://doi.org/10.1109/e-Science62913.2024.10678663

Zavalin, V., & Zavalina, O. L. (2023). Exploration of Accuracy, Completeness and Consistency in Metadata for Physical Objects in Museum Collections. In I. Sserwanga, A. Goulding, H. Moulaison-Sandy, J. T. Du, A. L. Soares, V. Hessami, & R. D. Frank (Eds.), Information for a Better World: Normality, Virtuality, Physicality, Inclusivity (Vol. 13972, pp. 83–90). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-28032-0_7

Zavalina, O. L., & Burke, M. (2021). Assessing Skill Building in Metadata Instruction: Quality Evaluation of Dublin Core Metadata Records Created by Graduate Students. Journal of Education for Library and Information Science, 62(4), 423–442. https://doi.org/10.3138/jelis.62-4-2020-0083

Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., & Auer, S. (2015). Quality assessment for Linked Data: A Survey: A systematic literature review and conceptual framework. Semantic Web, 7(1), 63–93. https://doi.org/10.3233/SW-150175

Ziegler, S. L. (2020). Open data in cultural heritage institutions: Can we be better than data brokers? Digital Humanities Quarterly, 14(2). https://dhq.digitalhumanities.org/vol/14/2/000462/000462.html

‘Appears to be about’: an evaluation of AI-generated metadata quality for community archives

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

About the Journal

Make a Submission

Information