OpenAlex in focus: Metadata quality of publication type and language fields in an open peer review corpus
DOI:
https://doi.org/10.47989/ir31iConf64207Keywords:
OpenAlex, Metadata quality, Publication type classification, Language metadata, CrossrefAbstract
Introduction. OpenAlex is widely used as a free bibliographic database for bibliometric and scholarly communication research. Despite its openness and coverage, its metadata contains inconsistencies that require systematic cleaning. This study examines metadata quality in OpenAlex, focusing on publication type and language.
Method. Publications on open peer review were retrieved from OpenAlex. After filtering and deduplication, 6,640 records were manually checked. Document type and language fields were cross-verified with publisher sources, while Crossref publication types were also collected for comparison.
Analysis. Manual classification was harmonised across categories to ensure comparability. The main focus was to evaluate the agreement between OpenAlex and manual classifications of type and language, and to assess the consistency of Crossref publication types with both.
Results. Of 6,640 records, 2,878 (43%) showed publication type discrepancies, with ‘Article’ most often misused. Crossref aligned more closely with OpenAlex in broad categories but diverged from manual verification. Additionally, 222 records (3.3%) had language mismatches, often English labels wrongly assigned to non-English works.
Conclusion(s). OpenAlex is a valuable infrastructure, yet its metadata for publication type and language shows notable inconsistencies. Researchers should apply systematic cleaning and validation before using OpenAlex or similar databases.
References
About the data. (n.d.). OpenAlex. Retrieved January 7, 2026, from https://help.openalex.org/hc/en-us/articles/24397285563671-About-the-data
About us. (n.d.). OpenAlex. Retrieved January 7, 2026, from https://help.openalex.org/hc/en-us/articles/24396686889751-About-us
Alperin, J. P., Portenoy, J., Demes, K., Larivière, V., & Haustein, S. (2024). An analysis of the suitability of OpenAlex for bibliometric analyses (No. arXiv:2404.17663). arXiv. https://doi.org/10.48550/arXiv.2404.17663
Céspedes, L., Kozlowski, D., Pradier, C., Sainte-Marie, M. H., Shokida, N. S., Benz, P., Poitras, C., Ninkov, A. B., Ebrahimy, S., Ayeni, P., Filali, S., Li, B., & Larivière, V. (2025). Evaluating the linguistic coverage of OpenAlex: An assessment of metadata accuracy and completeness. Journal of the Association for Information Science and Technology, 76(6), 884–895. https://doi.org/10.1002/asi.24979
Gelfand, J. M., & Lin, A. (2020). How Open Science Influences Next Developments in Grey Literature. Grey Journal (TGJ), 16(1), 34–48.
Giannini, S., & Molino, A. (2020). Open Access—A Never-Ending Transition? Grey Journal (TGJ), 16(1), 6–26.
Gusenbauer, M., & Gauster, S. P. (2025). How to search for literature in systematic reviews and meta-analyses: A comprehensive step-by-step guide. Technological Forecasting and Social Change, 212, 123833. https://doi.org/10.1016/j.techfore.2024.123833
Haupka, N., Culbert, J. H., Schniedermann, A., Jahn, N., & Mayr, P. (2025). Analysis of the Publication and Document Types in OpenAlex, Web of Science, Scopus, PubMed and Semantic Scholar (No. arXiv:2406.15154). arXiv. https://doi.org/10.48550/arXiv.2406.15154
Hauschke, C., & Nazarovets, S. (2025). (Non-)retracted academic papers in OpenAlex. Journal of Information Science. https://doi.org/10.1177/01655515251322478
Hval, G., Harboe, I., Johansen, M., Larsen, M., & Næss, G. (2023). Evaluation of OpenAlex. Folkehelseinstituttet. https://www.fhi.no/en/publ/2023/Evaluation-of-OpenAlex/
Jahn, N., Haupka, N., & Hobert, A. (2023). Scholarly Communication Analytics: Analysing and reclassifying open access information in OpenAlex. https://subugoe.github.io/scholcomm_analytics/posts/oalex_oa_status/
Jason. (2025, October 1). OpenAlex rewrite enters beta! OpenAlex Blog. https://blog.openalex.org/openalex-rewrite-enters-beta-%f0%9f%8e%89/
Jiao, C., Li, K., & Fang, Z. (2023). How are exclusively data journals indexed in major scholarly databases? An examination of the Web of Science, Scopus, Dimensions, and OpenAlex (No. arXiv:2307.09704). arXiv. https://doi.org/10.48550/arXiv.2307.09704
Kar, S., & Rath, D. S. (2025). Open Data in Social Sciences: Growth, Impact, and Equity in Data Paper Publishing | DESIDOC Journal of Library & Information Technology. DESIDOC Journal of Library & Information Technology, 45(4), 350–366.
Ortega, J. L., & Delgado‑Quirós, L. (2024). The indexation of retracted literature in seven principal scholarly databases: A coverage comparison of dimensions, OpenAlex, PubMed, Scilit, Scopus, The Lens and Web of Science. Scientometrics, 129, 3769–3785. https://doi.org/10.1007/s11192-024-05034-y
Priem, J., Piwowar, H., & Orr, R. (2022). OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts (No. arXiv:2205.01833). arXiv. https://doi.org/10.48550/arXiv.2205.01833
Simard, M.-A., Basson, I., Hare, M., Lariviere, V., & Mongeon, P. (2024). The open access coverage of OpenAlex, Scopus and Web of Science (No. arXiv:2404.01985). arXiv. https://doi.org/10.48550/arXiv.2404.01985
Thelwall, M., & Jiang, X. (2025). Is OpenAlex suitable for research quality evaluation and which citation indicator is best? https://doi.org/10.1002/asi.70020
Zhang, L., Cao, Z., Shang, Y., Sivertsen, G., & Huang, Y. (2024). Missing institutions in OpenAlex: Possible reasons, implications, and solutions. Scientometrics, 129(10), 5869–5891. https://doi.org/10.1007/s11192-023-04923-y
OpenAlex Technical Documentation. (2025, December 14). Work object. https://docs.openalex.org/api-entities/works/work-object
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Güleda Doğan , Ayça Nur Sezen

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
