‘Appears to be about’: an evaluation of AI-generated metadata quality for community archives
DOI:
https://doi.org/10.47989/ir31iConf64144Keywords:
Cultural heritage, Community archives, Linked open data, Artificial intelligenceAbstract
Introduction. We report on an evaluation of the quality of metadata generated by a general purpose chatbot using items from a community organisation archive.
Method. We developed an evaluation framework adapting quality dimensions from prior work and applied it to analyse a sample of 140 Dublin Core metadata records created by ChatGPT 4o from primary sources drawn from a community organisation collection, based on informal prompts.
Analysis. Using independent qualitative coders and a peer review process, we assessed accuracy, conformance, consistency, completeness, objectiveness, transparency, bias, engagement, meaning and context, understandability, and provenance.
Results. We found approximately 70% of elements to be accurate. Most records were substantially complete and objective but often vague. Records exhibited significant inconsistencies in how ChatGPT completed fields, conformed to the Dublin Core schema, and interpreted primary sources.
Conclusion(s). General purpose AI chatbots have the capacity to provide substantial ‘rough draft’ descriptive records for community collections, even with minimal prompting. These records require significant human intervention to ensure quality in terms of completeness, conformance to schema, accuracy, and meaningfulness to users. We offer insights for organisations and communities working with AI chatbots for description, along with implications for broader archival practice.
References
4-H. (n.d.). National 4-H Council. Retrieved September 11, 2025, from https://4-h.org/ (Archive Link)
4-H History Preservation Program. (n.d.). National 4-History Preservation Program. Retrieved August 7, 2025, from https://4-hhistorypreservation.com/ (Archive Link)
Aljalahmah, S., & Zavalina, O. L. (2024). Student-Created Dublin Core Metadata Representing Arabic Language eBooks: Comparison of Individual and Group Work Outcomes. Journal of Education for Library and Information Science, 65(3), 325–344. https://doi.org/10.3138/jelis-2023-0016
Berger, T. (2024, October 23). Can You Try Again?: Using Large Language Models to Generate Alt Text for Online Image Collections. The Virtual 2024 DLF Forum. https://osf.io/pc4rx/
Birhane, A., Kasirzadeh, A., Leslie, D., & Wachter, A. (2023) Science in the Age of Large Language Models. Nature Reviews Physics, 5(5), 277-280. https://doi.org/10.1038/s42254-023-00581-4
Brador, I. (2024, November 19). Could Artificial Intelligence Help Catalog Thousands of Digital Library
Breeding, M. (2023). AI: Potential Benefits and Concerns for Libraries. Computers in Libraries, 43(4), 17–19.
Bruce, T. R., & Hillmann, D. I. (2004). The Continuum of metadata quality: Defining, expressing, exploiting. ALA Editions.
Brzustowicz, R. (2023). From ChatGPT to CatGPT: The Implications of Artificial Intelligence on Library Cataloguing. Information Technology and Libraries, 42(3). https://doi.org/10.5860/ital.v42i3.16295
Bucciferro, A. (2008). Attacking the Backlog: NARA Archivists Mobilise to Make Unprocessed Records Available to the Public. Prologue Magazine, 40(2), 46-51.
Carter, K. S., Gondek, A., Underwood, W., Randby, T., & Marciano, R. (2022). Using AI and ML to optimise information discovery in under-utilised, Holocaust-related records. AI & SOCIETY, 37(3), 837–858. https://doi.org/10.1007/s00146-021-01368-w
Chow, E. H. C., Kao, T. J., & Li, X. (2024). An Experiment with the Use of ChatGPT for LCSH Subject Assignment on Electronic Theses and Dissertations. Cataloguing & Classification Quarterly, 62(5), 574–588. https://doi.org/10.1080/01639374.2024.2394516
Chun, W. H. K. (2004). On Software, or the Persistence of Visual Knowledge. Grey Room, 18, 26–51.
Ciecko, B. (2020). AI sees what? The good, the bad, and the ugly of machine vision for museum collections. The Museum Review, 5(1). https://static1.squarespace.com/static/578a4d33e4fcb586152bc72d/t/5ea76766c971ba41c7ed4403/1588029296143/TMR_vol5no1_Ceicko.pdf
Crawford, K. (2024). Generative AI’s environmental costs are soaring - and mostly secret. Nature, 626, 693. https://doi.org/10.1038/d41586-024-00478-x
DCMI Metadata Terms. (2020). DCMI. https://www.dublincore.org/specifications/dublin-core/dcmi-terms/ (Archive Link)
Dilmegani, C. (2025, July 9). Data Quality in AI: Challenges, Importance & Best Practices. AIMultiple. https://research.aimultiple.com/data-quality-ai/ (Archive Link)
Ex Libris. (n.d.). The AI metadata assistant in the metadata editor. Ex Libris Knowledge Center. https://knowledge.exlibrisgroup.com/Alma/Product_Documentation/010Alma_Online_Help_(English)/Metadata_Management/005Introduction_to_Metadata_Management/The_AI_Metadata_Assistant_in_the_Metadata_Editor# (Archive Link)
Fenlon, K., Havens, L., Marsh, D. E., Wise, N., Smoke, U., Navarrete, C., Sioui, J., Mantle, D., & Sorensen, A. (2025). Linked data workflows for community collections: Experiments with open access AI. 88th Annual Meeting for the Association of Information Science and Technology Conference Proceedings, 62. https://doi.org/10.1002/pra2.1246
Fisher, S. A. (2024). Large language models and their big bullshit potential. Ethics and Information Technology, 26(4), 67. https://doi.org/10.1007/s10676-024-09802-5
Foka, A., Griffin, G., Ortiz Pablo, D., Rajkowska, P., & Badri, S. (2025). Tracing the bias loop: AI, cultural heritage, and bias-mitigating in practice. AI & Society. https://www.doi.org/10.1007/s00146-025-02349-z
Gavrilis, D., Makri, D.-N., Papachristopoulos, L., Angelis, S., Kravvaritis, K., Papatheodorou, C., & Constantopoulos, P. (2015). Measuring Quality in Metadata Repositories. In S. Kapidakis, C. Mazurek, & M. Werla (Eds.), Research and Advanced Technology for Digital Libraries (pp. 56–67). Springer International Publishing. https://doi.org/10.1007/978-3-319-24592-8_5
Greenberg, J. (2003). Metadata Generation: Processes, People and Tools. Bulletin of the American Society for Information Science and Technology, 29(2), 16–19. https://doi.org/10.1002/bult.269
Greene, M. A., & Meissner, D. (2005). More Product, Less Process: Revamping Traditional Archival Processing. The American Archivist, 68(2), 208-263. (Archive Link)
Havens, L., Bach, B., Terras, M., & Alex, B. (2025). Investigating the Capabilities and Limitations of Machine Learning for Identifying Bias in English Language Data with Information and Heritage Professionals. Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, 1–22. https://doi.org/10.1145/3706598.3713217
Hicks, M. T., Humphries, J., & Slater, J. (2024). ChatGPT is bullshit. Ethics and Information Technology, 26(2), 38. https://doi.org/10.1007/s10676-024-09775-5
Hosseini, K., Wilson, D. C., Beelen, K., & McDonough, K. (2022). MapReader: a computer vision pipeline for the semantic exploration of maps at scale. Paper presented at the Proceedings of the 6th ACM SIGSPATIAL International Workshop on Geospatial Humanities.
Huang, J., Provo, A. A., McKeehan, M., & Wittmann, R. (2016). Inclusive Metadata Toolkit. Digital Library Federation. https://osf.io/2nmpc/
Hutchinson, B., Rostamzadeh, N., Greer, C., Heller, K., & Prabhakaran, V. (2022). Evaluation Gaps in Machine Learning Practice. 2022 ACM Conference on Fairness Accountability and Transparency, 1859–1876. https://doi.org/10.1145/3531146.3533233
Implementing Australia’s AI Ethics Principles in Government. (2024). Australian Government Department of Finance. https://www.finance.gov.au/government/public-data/data-and-digital-ministers-meeting/national-framework-assurance-artificial-intelligence-government/implementing-australias-ai-ethics-principles-government (Archive Link)
Jaillant, L., & Aske, K. (2024). AI and medical images: Addressing ethical challenges to provide responsible access to historical medical illustrations. Digital Humanities Quarterly, 18(3). https://dhq.digitalhumanities.org/vol/18/2/000755/000755.html (Archive Link)
Jaillant, L., Mitchell, O., Ewoh-Opu, E., & Urbaneja, M.H. (2025). How can we improve the diversity of archival collections with AI? Opportunities, risks, and solutions. AI & Society, 40, 4447–4459. https://doi.org/10.1007/s00146-025-02222-z
JSTOR (n.d.) JSTOR Digital Stewardship Services. https://about.jstor.org/get-jstor/digital-stewardship/ (Archive Link)
Kalai, A. T., Nachum, O., Vempala, S. S., & Zhang, E. (2025). Why Language Models Hallucinate (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2509.04664
Király, P. (2018, November 28). Metadata quality in cultural heritage institutions. Workshop on FAIR Principles for Digital Research Data Management.
Kneese, T. (2023, August 2). Climate Justice & Labor Rights. http://dx.doi.org/10.2139/ssrn.4533853
Kneese, T. (2024, February 12). Measuring AI’s environmental impact requires empirical research and standards. TechPolicy.Press. https://www.techpolicy.press/measuring-ais-environmental-impacts-requires-empirical-research-and-standards/ (Archive Link)
Kneese, T., & Young, M. (2024). Carbon emissions in the tailpipe of generative AI. Harvard Data Science Review. https://doi.org/10.1162/99608f92.fbdf6128
Kugler, L. (2025). How Do You Measure AI? Communications of the ACM, 68(4), 15–17. https://doi.org/10.1145/3708972
Long, D. X., Dinh, D., Nguyen, N.-H., Kawaguchi, K., Chen, N. F., Joty, S., & Kan, M.-Y. (2025). What Makes a Good Natural Language Prompt? Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 5835–5873. https://doi.org/10.18653/v1/2025.acl-long.292
Magnus, B., Priem, M., Vanderperren, N., Berghe, P. V., Keer, E. V., & Vissers, R. (2024). Metadata creation and enrichment using artificial intelligence at meemoo. Journal of Digital Media Management, 13(2), 110–123. https://doi.org/10.69554/NGFF5280
Männistö, A., Seker, M., Iosifidis, A., & Raitoharju, J. (2022). Automatic Image Content Extraction: Operationalising Machine Learning in Humanistic Photographic Studies of Large Visual Archives. arXiv. https://doi.org/10.48550/ARXIV.2204.02149
Margaritopoulos, M., Margaritopoulos, T., Mavridis, I., & Manitsaris, A. (2012). Quantifying and measuring metadata completeness. Journal of the American Society for Information Science and Technology, 63(4), 724–737. https://doi.org/10.1002/asi.21706
Marinescu, M.-C., Reshetnikov, A., & López, J. M. (2020). Improving object detection in paintings based on time contexts. 2020 International Conference on Data Mining Workshops (ICDMW), 926–932. https://doi.org/10.1109/ICDMW51313.2020.00133
Marsh, D. E. (2019). Research-Driven Approaches to Improving Archival Discovery. IASSIST Quarterly 43(2), 1–9. https://doi.org/https://doi.org/10.29173/iq955.
Marvin, G., Hellen, N., Jjingo, D., Nakatumba-Nabende, J. (2024). Prompt Engineering in Large Language Models. In Jacob, I.J., Piramuthu, S., Falkowski-Gilski, P. (eds), Data Intelligence and Cognitive Informatics. ICDICI 2023. Algorithms for Intelligent Systems, pp. 387-402. Springer. https://doi.org/10.1007/978-981-99-7962-2_30
Meaker, M. (2023, September 11). These prisoners are training AI. Wired. https://www.wired.com/story/prisoners-training-ai-finland/(Archive Link)
Media Types. (2025, September 2). Internet Assigned Numbers Authority. https://www.iana.org/assignments/media-types/media-types.xhtml (Archive Link)
Metadata Assessment Framework and Guidance. (n.d.). DLF Metadata Assessment Working Group. https://dlfmetadataassessment.github.io/projects/framework/ (Archive Link)
Metadata Quality. (n.d.). Data Europa. https://data.europa.eu/mqa/methodology?locale=en (Archive Link)
Metadata Schema Assessment Framework. (2024). ALA Core Metadata Standards Committee. https://hdl.handle.net/11213/22781
Mollema, W.J.T. (2024). ‘AI colonialism’ is a conceptual metaphor. [Masters thesis, Utrecht University]. Utrecht University Student Theses Repository. https://studenttheses.uu.nl/handle/20.500.12932/47214?show=full
Nockels, J., Gooding, P., Ames, S., & Terras, M. (2022). Understanding the application of handwritten text recognition technology in heritage contexts: a systematic review of Transkribus in published research. Archival Science, 22(3), 367-392.
One-to-One Principle. (2011, May 1). DCMI. https://www.dublincore.org/resources/glossary/one-to-one_principle/ (Archive Link)
OpenAI. (2022, November 30). Introducing ChatGPT. OpenAI. https://openai.com/index/chatgpt/ (Archive Link)
Osti, G., & Roke E. R. (2024). Collaborating for Change? Assessing Metadata Inclusivity in Digital Collections with Large Language Models (LLMs). 2024 IEEE International Conference on Big Data (BigData), 2479-2488. Washington, DC. https://doi.org/10.1109/BigData62323.2024.10825858.
Panitch, J. M. (2001). Special Collections in ARL Libraries: Results of the 1998 survey sponsored by the ARL Research Collections Committee. Association of Research Libraries.
Pepper, J., Jones, E., Zhao, X., Furst, J., Langlois, K., Uribe-Romo, F., Breen, D., & Greenberg, J. (2024). AI-Ready Data: Knowledge Extraction from Archival Lab Notebooks. 2024 IEEE International Conference on Big Data (BigData), 2489–2495. https://doi.org/10.1109/BigData62323.2024.10825206
Prud’homme, P. A., & Compton, J. (2020). A Research Study of Inventory Practices in Archives in the United States: Scalability and Process. Society of American Archivists Research Forum, 1-12. https://www2.archivists.org/sites/all/files/Inventory%20Practices%20in%20Archives%20FINAL.pdf
Raji, I. D., Bender, E. M., Paullada, A., Denton, E., & Hanna, A. (n.d.). AI and the Everything in the Whole Wide World Benchmark.
Ray, A., Tirrell, J., & Sayers, A. (2025). From Assimilation to Autonomy: Rethinking Data Sovereignty in the Age of Large Language Models. Technical Communication Quarterly, 34(3), 353–372. https://doi.org/10.1080/10572252.2025.2490503
Report and Recommendations from the Task Force on Metadata Quality. (2013). https://pro.europeana.eu/files/Europeana_Professional/Publications/Metadata%20Quality%20Report.pdf?__cf_chl_tk=m0ZH3G7aVKKnPtAIdNE5EeLIzumIEVgs4GjY3VBNzJE-1745323769-1.0.1.1-h7uHrd2oK5A1wWZq6XY9ZpsXDGcLfW5bO.tAMDpDHxo#page=3.34
Roke, E. (2025). Metadata Remediation through AI Collaboration. Paper presented at the SAA Research Forum, Online. https://www2.archivists.org/sites/all/files/2.1.4-Roke.pdf
Rotman, D. (2025, May 20). AI could keep us dependent on natural gas for decades to come. MIT Technology Review, Climate Change and Energy Series.
Schema.org. (n.d.). Schema.org. https://schema.org/ (Archive Link)
Schwabe, D., Becker, K., Seyferth, M., Klaß, A., & Schaeffter, T. (2024). The METRIC-framework for assessing data quality for trustworthy AI in medicine: a systematic review. npj Digital Medicine, 7(1), 203. doi:10.1038/s41746-024-01196-4
Society for American Archivists. (n.d.). C.F.W. Coker award: JSTOR Seeklight. https://www2.archivists.org/recipients/2025/cfw-coker-award-jstor-seeklight (Archive Link)
Steyvers, M., Tejeda, H., Kumar, A., Belem, C., Karny, S., Hu, X., Mayer, L.W., Smyth, P. (2025). What large language models know and what people think they know. Nature Machine Intelligence 7, 221–231. https://doi.org/10.1038/s42256-024-00976-7
Stvilia, B., & Gasser, L. (2008). Value-based metadata quality assessment. Library & Information Science Research, 30(1), 67–74. https://doi.org/10.1016/j.lisr.2007.06.006
Sun, Z., Yan, Y., & Zeng, Y. (2025). How to get enriched metadata? A multi-model model fusion strategy for automatic metadata enhancement in GLAM art collections. 88th Annual Meeting for the Association of Information Science and Technology Conference Proceedings, 62.
Sundararasan, T. (2024). Data sovereignty: Indigenous ownership in the age of AI. In Artificial Intelligence in Education Editors (pp. 151-166). Mithra Publication Tamil Nadu. https://doi.org/10.1037/1528-3542.4.3.507
Suominen, O. (2019). Annif: DIY automated subject indexing using multiple algorithms. LIBER Quarterly: The Journal of the Association of European Research Libraries, 29(1), 1–25. https://doi.org/10.18352/lq.10285
Taniguchi, S. (2024). Creating and Evaluating MARC 21 Bibliographic Records Using ChatGPT. Cataloging & Classification Quarterly, 62(5), 527–546. https://doi.org/10.1080/01639374.2024.2394513
Temple, J. (2025, May 20). The data centre boom in the desert. MIT Technology Review, Climate Change and Energy Series.
Walter, M., & Russo Carroll, S. (2020). Indigenous Data Sovereignty, governance, and the link to Indigenous policy. In Indigenous Data Sovereignty and Policy, pp. 1–20. Routledge. https://library.oapen.org/handle/20.500.12657/42782
Weissner, M. (2024). Ready, set, scan: National Archives to digitise 500M records by 2026. Federal Times. https://www.federaltimes.com/it-networks/ai/2024/04/18/ready-set-scan-national-archives-to-digitise-500m-records-by-2026/ (Archive Link)
Wen, S. (2014, November 11). The Ladies Vanish. The New Inquiry, Essays, and Reviews. https://thenewinquiry.com/the-ladies-vanish/ (Archive Link)
Widder, D.G., & Kneese, T. (2025). Salvage anthropology and low-resource NLP: what computer science should learn from the social sciences. Interactions, 32(2), 46–49. https://doi.org/10.1145/3714996.
Xiong, M., Hu, Z., Lu, X., Li, Y., Fu, J., He, J., & Hooi, B. (2024, March 17). Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs. ICLR 2024. https://doi.org/10.48550/arXiv.2306.13063
Yu, L., Charlton, A., Terras, M., & Filgueira, R. (2024). Advancing frances: New Heritage Textual Ontology, Enhanced Knowledge Graphs, and Refined Search Capabilities. 2024 IEEE 20th International Conference on E-Science (e-Science), 1–10. https://doi.org/10.1109/e-Science62913.2024.10678663
Zavalin, V., & Zavalina, O. L. (2023). Exploration of Accuracy, Completeness and Consistency in Metadata for Physical Objects in Museum Collections. In I. Sserwanga, A. Goulding, H. Moulaison-Sandy, J. T. Du, A. L. Soares, V. Hessami, & R. D. Frank (Eds.), Information for a Better World: Normality, Virtuality, Physicality, Inclusivity (Vol. 13972, pp. 83–90). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-28032-0_7
Zavalina, O. L., & Burke, M. (2021). Assessing Skill Building in Metadata Instruction: Quality Evaluation of Dublin Core Metadata Records Created by Graduate Students. Journal of Education for Library and Information Science, 62(4), 423–442. https://doi.org/10.3138/jelis.62-4-2020-0083
Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., & Auer, S. (2015). Quality assessment for Linked Data: A Survey: A systematic literature review and conceptual framework. Semantic Web, 7(1), 63–93. https://doi.org/10.3233/SW-150175
Ziegler, S. L. (2020). Open data in cultural heritage institutions: Can we be better than data brokers? Digital Humanities Quarterly, 14(2). https://dhq.digitalhumanities.org/vol/14/2/000462/000462.html
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Nikki Wise, Katrina Fenlon, Diana Marsh, Amanda Sorensen, Ugoma Smoke, Candy Navarette, Lucy Havens

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
