Abstract

Information Research

1368-1613

University of Borås

ir30iConf47215

10.47989/ir30iConf47215

Research article

Are we there yet? Evaluation of AI-generated metadata for online information resources

Zavalin

Vyacheslav

Zavalina

Oksana L.

Vyacheslav Zavalin is Assistant Professor in Department of Information Science, Texas Woman’s University of North Texas, USA. He received his Ph.D. from University of North Texas, and his research interests are in cataloguing, metadata, and data analytics. Dr. Zavalin can be contacted at vzavalin@twu.edu Oksana L. Zavalina is Professor in Department of Information Science, University of North Texas, USA. She received her Ph.D. from University of Illinois, and her research interests are in the information organization in libraries and other repositories, assessment of metadata quality and user needs. Dr. Zavalina can be contacted at Oksana.Zavalina@unt.edu

06052025

2025

30 i 732 740

2025

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), permitting all non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Introduction. Generative AI tools are increasingly used in creating descriptive metadata the quality of which is key for information discovery and support of information user tasks. Machine-readable online information resources such as websites naturally lend themselves to automatic metadata creation. Yet, assessments of AI-generated metadata for them are lacking. AI metadata quality research to date is limited to 2 metadata standards.

Method. This experimental study assessed the quality of AI-generated descriptive metadata in 4 most widely used standards: Dublin core, MODS, MARC, and BIBFRAME. Three generative AI tools – Gemini, Gemini advanced, and ChatGPT4 – were used to create metadata for an educational website.

Analysis. Zero-shot queries prompting AI tools to generate metadata followed the same structure and included the link to metadata scheme’s openly accessible documentation. Comparative in-depth analysis of accuracy and completeness of entire resulting AI-generated metadata records was performed.

Results. Overall, AI-generated metadata does not meet the quality threshold. ChatGPT performs somewhat better than 2 other tools on completeness, but accuracy is similarly low in all 3 tools.

Conclusions. Current metadata-generating effectiveness of AI tools does not allow to conclude that involvement of human metadata experts in creation of quality (and therefore functional) metadata can be significantly reduced without strong negative impact on information discovery.

Introduction

Artificial intelligence (AI) generative tools are increasingly experimented with to assess AI usability in assisting humans in descriptive metadata creation. Metadata is crucial for discoverability of all kinds of information resources. Major metadata standards designed with the goal of describing (with relatively small-scale sets of metadata elements) and providing access to online information resources include 2 versions of Dublin Core (simple (https://www.dublincore.org/specifications/dublin-core/dces/)and extended (https://www.dublincore.org/specifications/dublin-core/dcmi-terms)), and metadata object description schema (MODS) (https://loc.gov/standards/mods/) introduced in late 1990s and evolving since then. In addition, the key metadata standard in use by libraries worldwide (with hundreds of millions of metadata records currently aggregated in WorldCat database (https://search.worldcat.org/)) since late 1970s, machine readable cataloguing (MARC) (https://www.loc.gov/marc/bibliographic/), originally intended for representing analogue (text and audio-visual) information resources, is a much more comprehensive metadata element set that has been continuously evolving into a standard that supports discovery of any type of information resources, including those available online in any file format (e.g., HTML, streaming audio and video, PDF, etc.). BIBFRAME (https://www.loc.gov/bibframe/) – launched in 2010s and intended as an alternative to MARC – is both an ontology and a rich metadata element set designed for enabling seamless resource discovery by emphasizing relationships representation and use of Linked Data approaches in describing information resources and entities related to them: their creators, contributors, subjects, etc. Regardless of the metadata element set used, metadata creators consult guidelines designed iteratively by teams of experts and approved by metadata community of practice to support decision-making by those tasked with describing information resources to facilitate discovery and access.

Literature Review

Metadata’s effectiveness in supporting information resource discovery, as well as specific information user tasks formulated in the IFLA library reference model (LRM) is the defining feature of quality metadata (Riva, Lebeouf, & Žumer, 2017). Metadata quality evaluations’ results inform improvements in metadata education and practice which translate into higher quality, more functional metadata. Evaluations are guided by metadata quality frameworks that define quality criteria and propose measures for assessing metadata against these criteria. The most influential framework developed by Bruce and Hillman (2004) consists of 7 criteria. Over the years, these criteria have been adopted and adapted for analysis in many metadata studies, and informed development of other frameworks, most notably the information quality framework by Stvilia and colleagues (2007), which consists of 22 criteria. Three metadata quality criteria defined in these frameworks – accuracy, completeness, and consistency – are used in studies the most often, and according to the survey of metadata managers (Park & Tosaka, 2010) are considered the most important in quality assurance.

Most metadata evaluations so far assessed human-created metadata because the lion’s share of metadata that supports discovery is created this way (even when automation tools are used to assist in this job, and even despite the recent advances of AI generative tools that offer a potential of partially automating the process) and will likely remain largely human-generated. It is important to remember that ideas behind using machines to automate the complex tasks of metadata creation are not new and certainly not unique to the current AI revolution. Decades before the generative AI emergence, machine-generated metadata was widely researched, developed, tested, and implemented (as shown, for example, in overview of automatic metadata generation projects in Greenberg, 2005). For instance, various tools designed for automating some of the relatively small-scale (in terms of the space, measured in number of characters occupied in the metadata record) but important and intellectually challenging metadata creation tasks (e.g., generating a classification number based on the included subject headings, etc.) have been in existence for decades and used with acceptable accuracy – by many professional metadata creators to reduce the amount of time needed for metadata record creation (e.g., Golub, 2021). Other popular uses of such tools were to automatically add to the metadata records certain codes (e.g., those representing languages used in the information resource and geographic areas covered by the content of the resource). These simple tools often take the form of macros that are shared in the professional community of metadata creators and are applied to represent any kind of information resource using the subscription based OCLC Connexion professional software (e.g., OCLC, 2023).

Other electronic tools designed to assist metadata creation are openly available to anyone interested and take the shape of online forms, to be completed, based on guidelines, by a human. The information entered by trained metadata creators into such form is then processed and converted into a certain encoding (e.g., JSON-LD, RDFXML, etc.). The most widely known such tool is the BIBFRAME Editor (https://bibframe.org/marva/editor/) designed to support human expert work with BIBFRAME metadata element set and guidelines. This tool has several application profiles built into it for representing monographs, notated music, serials, cartographic resources, sound recordings, moving images, rare materials, prints and photographs. Another notable openly accessible tool that is a set of online forms (for representing datasets, e-books, government documents, maps, microfilms, monographs, scores, serials, and theses and dissertations) is Metadata Maker (https://metadatamaker.library.illinois.edu/) developed by the University of Illinois at Urbana-Champaign Library based on BIBFRAME 2.0 (Heng & Han, 2023; Michael & Han, 2020). This tool allows creating metadata in MARC, MARCXML, MODS, and ONIX metadata schemes.

Due to their very nature, machine-readability, and internet indexability, online information resources such as webpages are presumably much easier to automate the metadata creation processes for and substantially reduce the amount of human effort needed. With that goal, long before the emergence of AI generative tools, the DC-dot (https://www.ukoln.ac.uk/metadata/dcdot/) automation tool was developed – using Java Script – and made available in the 1990s by the Innovation Support Centre at The United Kingdom Office for Library and Information Networking (UKOLN). The DC-dot tool generated metadata records encoded in either HTML or RDFXML and was widely used by metadata creators for generating an entire metadata record (albeit basic, and in need of verification and augmentation by human experts) in simple version of Dublin Core standard that includes 15 metadata elements. Some versions of this popular tool were integrated in professional workflow in OCLC Connexion – the metadata management utility that feeds WorldCat database. However, due to ‘the cessation of core funding’ in 2013, this popular tool that consistently generated reliable metadata results for webpages was discontinued. Ten years later, the AI generative tools became available, with the promise of offering not only the functionality of DC-dot but more robust automatic metadata generation use, capable of creating metadata records for various kinds of information resources and following various metadata schemes.

Most published studies which assess descriptive metadata quality focus on Dublin Core, MODS, or MARC (e.g., Jackson et al., 2008; Kurtz, 2010; Park & Maszaros, 2009; Weagley, Gelches & Park, 2010; Zavalin, Zavalina & Safa, 2021; Zavalin, Zavalina & Miksa, 2021). Likely due to the much younger age of BIBFRAME metadata element set and the vast complexity and length of BIBFRAME metadata records – which, when of a reasonable quality, are at least three times longer than MARC records for the same information resource represented in XML syntax (https://id.loc.gov/tools/bibframe/comparebf-lccn/2018958785.xml, ) - no published studies have examined the quality of BIBFRAME metadata.

AI’s potential uses in metadata generation have been explored by multiple authors who focused on individual specific metadata tasks such as automatic assignment of subject headings (e.g., Chou & Chu, 2022; Chow, Kao, & Li, 2024; Ganadi et al., 2023) or classification numbers (Bodenhamer, 2023; Martorana, et al., 2024; Zhang, Wu, & Zhang, 2023). Two studies evaluating AI-generated descriptive metadata records have been published at the time of writing this paper, to the end of our knowledge. Taniguchi (2024) compared ChatGPT-created and human-expert-created MARC records for maps, sheet music, sound recordings, etc. Brzustowitz (2023) used in-depth comparative analysis of small samples of MARC and Dublin Core records created by human metadata experts and ChatGPT to represent books and audio recordings. As of the time of writing our paper, no studies have explored the quality of AI-generated metadata representing online resources and no study looked beyond MARC and simple Dublin Core in AI-generated analysis. Our project preliminary results of which are reported below addresses this research gap.

Method

In this study, we used basic and advanced versions of Google’s Gemini AI tool, as well as ChatGPT 4, to generate AI metadata for the same resource (an English-language educational webpage (https://informationscience.unt.edu/)) in 4 most widely used metadata schemes / element sets (in the order of simplest-to-most-complex):

Extended version of Dublin Core: Dublin Core Metadata Initiative (DCMI) Metadata Terms

Metadata Object Description Schema (MODS)

MARC 21 Bibliographic Format, and

BIBFRAME.

Google’s two AI generative tools – Gemini and Gemini Advanced – were selected for our study, in addition to ChatGPT that is the most used AI-generative tool in existing research on AI usability in metadata generation. The reason for Gemini and Gemini Advanced selection was that this study’s target information resource type is a website indexed by Google. Thus, the assumption was that these Google AI tools can create good quality metadata by harnessing the power of Google- indexed webpages.

We analysed the quality of the results, relying on our team’s expert knowledge of these metadata elements sets and existing guidelines and documentation for these standards, as well as on expert-human-created metadata for the target information resource used in this experiment, with the goal of answering the following research questions:

How accurate is the AI-generated metadata?

How complete is the AI-generated metadata?

How does the quality of AI-generated metadata compare across the 4 metadata element sets and 3 AI generative tools?

We evaluated completeness and accuracy based on the measures detailed in the recent studies analysing quality of metadata in Dublin Core and other metadata schemes (e.g., Aljalahmah & Zavalina, 2024; Zavalin & Zavalina, 2023; Zavalina & Burke, 2021). The measures included:

for completeness: the total number of metadata fields and their instances in the record, the number of missing applicable metadata fields/instances, etc.

for accuracy: the use of appropriate encoding, well-formedness, and validity of XML; the number of metadata fields and instances containing errors: misrepresentation of information resource in the date value, using non-authorized term from a controlled vocabulary, misusing the field that is intended for other kinds of data, etc.

The official documentation for the examined metadata standards and human-created metadata records for the target information resource served as a source of information on the number of applicable metadata fields, etc. that were used in analysis. The prompts used in all 3 generative AI tools followed this pattern: ‘Generate a metadata record for this webpage: https://informationscience.unt.edu/. Follow the [NAME & VERSION] standard: [LINK TO STANDARD]’. For all 4 metadata schemes, the official documentation is freely available online in machine-readable HTML or XML format.

Preliminary Results and Discussion

Figure 1 presents the prompts that were used in this experiment and summarizes the results of the quality assessment of the outputs generated by Gemini, Gemini Advanced, and ChatGPT. Overall, metadata records generated by ChatGPT were more detailed. Those generated by Gemini Advanced for 3 metadata schemes – Dublin Core, MODS, and MARC – tended to contain more details than the ones generated by Gemini. This is in line with our assumptions based on Google’s positioning Gemini Advanced as a more powerful AI tool.

Dublin Core metadata generated by ChatGPT and Gemini Advanced, while incomplete, was comparable in the number of metadata fields included to OAIster-harvested Dublin Core records that previous research found to have 8 metadata fields in most cases (Jackson et al., 2008). However, it is important to note that OAIster aggregation metadata follows the simple version of Dublin Core, while this experiment tasked AI generative tools with creating metadata that follows the extended version of Dublin Core standard (DCMI Metadata Terms) which has over 40 metadata elements as opposed to 15. For more robust MODS and MARC metadata schemes, completeness was moderate (with a few important omissions) in ChatGPT-generated record, and very low in both versions of the Gemini tool, although somewhat more complete in Gemini Advanced (Figure 1).

While ChatGPT and Gemini Advanced generated BIBFRAME records (albeit incomplete, and with basic JSON encoding as opposed to JSON-LD or RDF serialization that support Linked Data functionality), Gemini returned this statement,

I don’t have a tool that can directly generate bibframe metadata for webpages, but I can search on the web using google tools. Here’s what I found on the web: I couldn’t find the bibframe metadata for this webpage, but there are some resources that discuss bibframe […].

The accuracy was low in all 3 generative-AI tools’ outputs for Dublin Core, MODS, and MARC. The proportion of metadata fields with mistakes was similarly high across tools for the same metadata scheme. Also, all 3 tools failed to generate the DCMI Metadata Terms records and generated records that follow the simpler version of this same metadata scheme: DCMES 1.1. However, unexpectedly, for Dublin Core and MODS, the simplest of the 3 tools, Gemini was more successful than Gemini Advanced in generating the metadata record properly encoded (in XML) and structured (with both top-level elements and sub elements) for MODS. ChatGPT was successful in structuring the metadata record in this hierarchical metadata scheme. ChatGPT-generated metadata was encoded in machine-readable syntax for all 4 metadata schemes (XML for Dublin Core and MODS, MARC for MARC Bibliographic Format, and JSON for BIBFRAME). However, XML encoding was missing key components that negatively affected its functionality in information discovery and wrong version of JSON syntax was used (Figure 1).

Figure 1.

Metadata quality evaluation results summary

none

Conclusions and next steps

The overall quality of AI-generated metadata in this experiment does NOT meet the basic expectations for functional metadata. Low-quality metadata – such as observed for AI-generated metadata in this study - fails to support Find, Identify, Select, Obtain, & Explore user tasks defined in IFLA LRM (Riva, Lebeouf, & Žumer, 2017). At this point in the development of AI tools, relying solely on AI-generated metadata creation would result in significantly compromised information access. However, AI-generated metadata could be used as a starting point in the human metadata creation for Dublin Core, MARC, and MODS metadata standards, with the need for thorough checking and editing by human metadata creators. AI-generated BIBFRAME metadata with Gemini and Gemini Advanced Google AI tools is currently non-existent, while ChatGPT-produced BIBFRAME metadata although very incomplete and not supporting Linked Data functionality, could be used by expert metadata professionals as a starting point, with significant manual revisions.

This exploratory research focused on one type of information resource previously excluded from experimenting with AI metadata generation: online educational website. Future analysis will compare accuracy and completeness of metadata generated by these same AI tools for different types of information resources. Another important direction for further research is training the AI-generative tools by refining the metadata generation prompts in different ways and by providing the tool (where applicable) with feedback on the quality of generated metadata.

Findings of studies such as this one, as well as experimenting with AI-generated metadata itself, need to be integrated in metadata education for information professionals. For example, instructors of cataloguing and digital library metadata courses can demonstrate to their students how these tools work for metadata generation, use AI-generated metadata in teaching students the topics of metadata quality evaluation, ask students to improve AI-generated records as part of their metadata-creation assignments.

The need for research into the quality of AI-generated descriptive metadata is time-sensitive because ‘AI has been increasingly shaping the library management landscape in recent years’ (Bisht et al., 2023). While we agree that ‘integration of AI technology has the potential to greatly enhance the efficiency, accuracy, and user experience in library cataloguing’ (Bisht et al., 2023) by assisting the human metadata creators, this potential needs to be thoroughly and objectively evaluated before taking any related management steps. It is important to conduct such examinations to collect the data that should be used in evidence-based decisions regarding management of information organizations (including libraries) and such crucial functions as metadata creation.

References

Aljalahmah

S.H.

Zavalina

O. L.

2024

Student-created Dublin Core metadata representing Arabic language eBooks: Comparison of individual and group work outcomes

Journal of Education for Library and Information Science (JELIS)653325344

https://doi.org/10.3138/jelis-2023-0016

Bisht

Nutiyal

A.P.

Sharma

Sai

Bathla

Singh

2023

The role of Artificial Intelligence in shaping library management and its utilization

2023 International Conference on Disruptive Technologies (ICDT)Greater Noida, India467472

https://doi.org/10.1109/ICDT57929.2023.10150520.

Bodenhamer

2023The reliability and usability of ChatGPT for library metadatahttps://hdl.handle.net/11244/339626 (accessed 12 August 2024)

Bruce

T.R.

Hillman

D.I.

2004

The continuum of metadata quality: defining, expressing, exploiting

Hillman

Westbrook

Metadata in Practice238256

American Library Association

Chicago

Brzustowicz

2023

From ChatGPT to CatGPT: The implications of Artificial Intelligence on library cataloguing

Information Technology and Libraries423

https://doi.org/10.5860/ital.v42i3.16295

Chou

Chu

2022

An analysis of BERT (NLP) for assisted subject indexing for project Gutenberg

Cataloging & Classification Quarterly688807835

https://doi.org/10.1080/01639374.2022.2138666

Chow

E. H. C.

Kao

T. J.

2024

An Experiment with the Use of ChatGPT for LCSH Subject Assignment on Electronic Theses and Dissertations

Cataloging & Classification Quarterly625574588

https://doi.org/10.1080/01639374.2024.2394516

Golub

2021

Automated subject indexing: An overview

Cataloging & Classification Quarterly598702719

https://doi.org/10.1080/01639374.2021.2012311

Greenberg

2005

Metadata generation: Processes, people, and tools

Bulletin of the American Society for Information Science and Technologyhttps://doi.org/10.1002/bult.269

Heng

Han

M.J.

2023

Revamping Metadata Maker for ‘Linked Data Editor’: Thinking Out Loud

Code4Lib Journal55

https://journal.code4lib.org/articles/16925

Ganadi

A.E.

Vigliermo

R.A.

Sala

Vanzini

Ruozzi

Bergamaschi

2023

Bridging Islamic knowledge and AI: inquiring ChatGPT on possible categorizations for an Islamic digital library

Proceedings of the 2nd Workshop on Artificial Intelligence for Cultural Heritage (IAI4CH 2023) co-located with the 22nd International Conference of the Italian Association for Artificial Intelligence (AIxIA 2023)November 6, 2023

Roma, Italy

https://ceurws.org/Vol-3536/03_paper.pdf (accessed 12 August 2024)

Jackson

A.S.

Han

M.J.

Groetsch

Mustafoff

Cole

T.W.

2008

Dublin Core metadata harvested through OAI-PMH

Journal of Library Metadata81521

https://doi.org/10.1300/J517v08n01_02

Kurtz

2010

Dublin Core, DSpace, and a brief analysis of three university repositories

Information Technology and Libraries2914046

https://doi.org/10.6017/ital.v29i1.3157

Martorana

Kuhn

Stork

Ossenbruggen

J.V.

2024

Zero-Shot Topic Classification of Column Headers: Leveraging LLMs for Metadata Enrichment

International Conference on Semantic SystemsarXiv-Computer Science-Databases

https://doi.org/10.48550/arXiv.2403.00884

Michael

Han

M.J.

2020

Assessing BIBFRAME 2.0: Exploratory Implementation in Metadata Maker

International Conference on Dublin Core and Metadata Applications2020

https://doi.org/10.23106/dcmi.952141614

OCLC

2023

OCLC-supplied macros

https://help.oclc.org/Metadata_Services/Connexion/Connexion_client/Connexion_client_basics/Use_macros/Get_started/50OCLC_supplied_macros

Park

J.R.

Maszaros

2009

Metadata Object Description Schema (MODS) in digital repositories: An exploratory study of metadata use and quality

Knowledge Organization3614659

https://doi.org/10.5771/0943-7444-2009-1-46

Park

J.R.

Tosaka

2010

Metadata quality control in digital repositories and collections: criteria, semantics, and mechanisms

Cataloging & Classification Quarterly488696715

https://doi.org/10.1080/01639374.2010.508711

Riva

Lebeouf

Žumer

2017

IFLA Library Reference Model: A Conceptual Model for Bibliographic Information

Retrieved August 12, 2024

https://www.ifla.org/wp- content/uploads/2019/05/assets/cataloguing/frbr-lrm/ifla-lrm-august-2017_rev201712.pdf

Stvilia

Gasser

Twidale

M. B.

Smith

L. C.

2007

A framework for information quality assessment

Journal of the American Society of Information Science5817201733

https://doi.org/10.1002/asi.20652

Taniguchi

2024

Creating and evaluating MARC 21 bibliographic records using ChatGPT

Cataloging & Classification Quarterly625527546

https://doi.org/10.1080/01639374.2024.2394513

Weagley

Gelches

Park

2010

Interoperability and metadata quality in digital video repositories: a study of Dublin Core

Journal of Library Metadata1013757

https://doi.org/10.1080/19386380903546984

Zavalin

V.I.

Zavalina

O. L.

2023

Exploration of accuracy, completeness, and consistency in metadata for physical objects in museum collections

Information for a Better World: Normality, Virtuality, Physicality, Inclusivity: 18th International ConferenceiConference 2023, Proceedings8390

https://doi.org/10.1007/978-3-031-28032-0_7

Zavalin

Zavalina

O.L.

Miksa

S.D.

2021

Exploration of subject representation and support of Linked Data in recently created library metadata: Examination of most widely held WorldCat bibliographic records

Library Resources and Technical Services654154165

https://doi.org/10.5860/lrts.65n4.1544

Zavalin

Zavalina

O.L.

Safa

2021

Patterns of subject metadata change in MARC21 bibliographic records representing video recordings

Proceedings of the Association for Information Science and Technology581

https://doi.org/10.1002/pra2.494

Zavalina

O.L.

Burke

2021

Assessing skill-building in metadata instruction: Quality evaluation of Dublin Core metadata records created by graduate students

Journal of Education for Library and Information Science624423442

https://doi.org/10.3138/jelis.62-4-2020-0083

Zhang

2023

Utilising a large language model to annotate subject metadata: a case study in an Australian national research data catalogue”

arXiv-CS-Computation and Languagehttps://doi.org/10.48550/arXiv.2310.11318