Investigating Document Type Discrepancies between OpenAlex and the Web of Science
DOI:
https://doi.org/10.29173/cais1943Keywords:
Bibliometrics, Research evaluation, OpenAlex, Web of Science, Library and Information Science, Open data, Metadata, Document typeAbstract
Bibliometrics, whether used for research or research evaluation, relies on large multidisciplinary databases of research outputs and citation indices. The Web of Science (WoS) was the main supporting infrastructure of the field for more than 30 years until several new competitors emerged. OpenAlex, launched in 2022, stands out for its openness and extensive coverage. While OpenAlex may reduce or eliminate barriers to accessing bibliometric data, one of the concerns that hinder its broader adoption for research and research evaluation is the quality of its metadata. This study aims to assess the metadata quality of works in OpenAlex and WoS, focusing on document type accuracy. We observe that over 4% of the publications indexed in both OpenAlex and WoS appear to be misclassified as research articles or reviews, and that the vast majority (about 97%) of these errors occur in OpenAlex. By addressing discrepancies and misattributions in document types this research seeks to enhance awareness of data quality issues that could impact bibliometric research and evaluation outcomes.
Enquête sur les divergences de types de documents entre OpenAlex et the Web of Science
Résumé
La bibliométrie, qu’elle soit utilisée pour la recherche ou pour l’évaluation de la recherche, repose sur de vastes bases de données multidisciplinaires regroupant des publications scientifiques et indices de citation. The Web of Science (WoS) était la principale infrastructure dans le domaine pendant plus 30 ans jusqu’à l’émergence de plusieurs nouveaux concurrents. OpenAlex, lancée en 2022, se démarque pour sa transparence et sa couverture étendue. Pendant qu’OpenAlex pouvait réduire voire éliminer les barrières pour l’accès aux données bibliométriques, une des préoccupations qui entravait sa large adoption pour la recherche et l'évaluation de la recherche est la qualité de ses métadonnées. Cette étude a pour but d’évaluer la qualité du travail des métadonnées dans OpenAlex et WoS, en se focalisant sur la précision du type de document. On observe que plus de 4% des publications indexées, à la fois dans OpenAlex et dans WoS, semblent être mal classées en tant qu’articles ou revues de recherche, et que la grande majorité (environ 97%) de ces erreurs se présentent dans OpenAlex. En relevant ces divergences et erreurs d’attribution dans les types de documents, cette recherche a pour but d’améliorer l’attention portée aux problèmes de qualité des données qui pourraient impacter les recherches bibliométriques et les résultats de l’évaluation.
Mots-clés
Bibliométrie; évaluation de la recherche; OpenAlex; Web of Science; Bibliothèque et Science de l’Information; données ouvertes, métadonnées, type de document
References
Alonso-Alvarez, P., & Eck, N. J. van. (2024). Coverage and metadata availability of African publications in OpenAlex: A comparative analysis (No. arXiv:2409.01120). arXiv. https://doi.org/10.48550/arXiv.2409.01120
Alperin, J. P., Portenoy, J., Demes, K., Larivière, V., & Haustein, S. (2024). An analysis of the suitability of OpenAlex for bibliometric analyses. arXiv. https://doi.org/10.48550/arXiv.2404.17663
Barcelona Declaration. (2024). “Barcelona Declaration on Open Research Information”. https://barcelona-declaration.org/
Bordignon, F. (2024). Is OpenAlex a revolution or a challenge for bibliometrics/bibliometricians? https://enpc.hal.science/hal-04520837
Céspedes, L., Kozlowski, D., Pradier, C., Sainte-Marie, M. H., Shokida, N. S., Benz, P., Poitras, C., Ninkov, A. B., Ebrahimy, S., Ayeni, P., Filali, S., Li, B., & Larivière, V. (2024). Evaluating the Linguistic Coverage of OpenAlex: An Assessment of Metadata Accuracy and Completeness. arXiv. https://doi.org/10.48550/arXiv.2409.10633
Culbert, J. H., Hobert, A., Jahn, N., Haupka, N., Schmidt, M., Donner, P., & Mayr, P. (2024). Reference Coverage Analysis of OpenAlex compared to Web of Science and Scopus. Cornell University. https://doi.org/10.48550/arxiv.2401.16359
Delgado-Quirós, L., & Ortega, J. L. (2024). Completeness degree of publication metadata in eight free-access scholarly databases. Quantitative Science Studies, 5(1), 31–49. https://doi.org/10.1162/qss_a_00286
Garfield, E., & Sher, I. H. (1963). New factors in the evaluation of scientific literature through citation indexing. American Documentation, 14(3), 195–201. https://doi.org/10.1002/asi.5090140304
Haupka, N., Culbert, J.H., Schniedermann, A., Jahn, N., & Mayr, P. (2024). Analysis of the publication and document types in OpenAlex, Web of Science, Scopus, Pubmed and Semantic Scholar. ArXiv. https://doi.org/10.48550/arXiv.2406.15154
Maddi, A., Maisonobe, M., & Boukacem-Zeghmouri, C. (2024). Geographical and disciplinary coverage of open access journals: OpenAlex, Scopus and WoS. ArXiv. https://doi.org/10.48550/arXiv.2411.03325
Mazoni, A., & Costas, R. (2024). Towards the democratisation of open research information for scientometrics and science policy: the Campinas experience. https://www.leidenmadtrics.nl/articles/towards-the-democratisation-of-open-research-information-for-scientometrics-and-science-policy-the-campinas-experience
Narin, F. (1976). Evaluative bibliometrics: The use of publication and citation analysis in the evaluation of scientific activity. Computer Horizons Washington, D. C.
Ortega, J. L., & Delgado‐Quirós, L. (2024). The indexation of retracted literature in seven principal scholarly databases: A coverage comparison of dimensions, OpenAlex, PubMed, Scilit, Scopus, The Lens and Web of Science. Scientometrics, 129(7), 3769–3785. https://doi.org/10.1007/s11192-024-05034-y
Priem, J., Piwowar, H., & Orr, R. (2022). OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts. In arXiv (Cornell University). Cornell University. https://doi.org/10.48550/arxiv.2205.01833
Schares, E. (2024). Comparing Funder Metadata in OpenAlex and Dimensions. https://doi.org/10.31274/b8136f97.ccc3dae4
Scheidsteger, T., & Haunschild, R. (2023). Which of the metadata with relevance for bibliometrics are the same and which are different when switching from Microsoft Academic Graph to OpenAlex? Profesional de La Información, 32(2). https://doi.org/10.3145/epi.2023.mar.09
Shi, J., Nason, M., Tullney, M., & Alperin, J. (2025). Identifying metadata quality issues across cultures. College & Research Libraries, 86(1). https://doi.org/10.5860/crl.86.1.101
Simard, M.-A., Basson, I., Hare, M., Lariviere, V., & Mongeon, P. (2024). The open access coverage of OpenAlex, Scopus and Web of Science. arXiv. https://doi.org/10.48550/arXiv.2404.01985
van Eck, N. J., Waltman, L., & Neijssel, M. (2024, October 9). Launch of the CWTS Leiden Ranking Open Edition 2024. Leiden Madtrics. https://www.leidenmadtrics.nl/articles/launch-of-the-cwts-leiden-ranking-open-edition-2024
Visser, M., van Eck, N. J., & Waltman, L. (2021). Large-scale comparison of bibliographic data sources: Scopus, Web of Science, Dimensions, Crossref, and Microsoft Academic. Quantitative Science Studies, 2(1), 20–41. https://doi.org/10.1162/qss_a_00112
Zhang, L., Cao, Z., Shang, Y., Sivertsen, G., & Huang, Y. (2024). Missing institutions in OpenAlex: Possible reasons, implications, and solutions. Scientometrics. https://doi.org/10.1007/s11192-023-04923-y
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Philippe Mongeon, Madelaine Hare, Geoff Krause, Rebecca Marjoram, Poppy Riddle, Rémi Toupin, Summer Wilson

This work is licensed under a Creative Commons Attribution 4.0 International License.
 
						


