Exploring a Machine-Generated Concept Hierarchy Through the Lens of "Naive" Classification
DOI:
https://doi.org/10.29173/cais1978Abstract
Scholarly communication research often relies on comprehensive subject classifications to evaluate research produced within or across disciplines. Such use of classification systems is less related to information retrieval and more aligned with the types of knowledge discovery tasks described by Beghtol (2003) in her discussion of naive classification. In this thesis project in progress, we investigate the machine learning processes used to generate the Microsoft Academic Graph and OpenAlex subject classification systems to better understand how this classification supports knowledge discovery in a research evaluation context, and in what ways it might be made more effective in that context.
References
Beghtol, C. (2003). Classification for information retrieval and classification for knowledge
discovery: Relationships between “professional” and “naïve” classifications. Knowledge
Organization, 30(2). 64-73.
Golub, K. (2021). Automated Subject Indexing: An Overview. Cataloging & Classification
Quarterly, 59(8), 702–719. https://doi.org/10.1080/01639374.2021.2012311
Huang, Y., Lu, W., Liu, J., Cheng, Q., & Bu, Y. (2022). Towards transdisciplinary impact of
scientific publications: A longitudinal, comprehensive, and large-scale analysis on Microsoft
Academic Graph. Information Processing & Management, 59(2). 10.1016/j.ipm.2021.102859
Liu, J., Chen, H., Liu, Z., Bu, Y., & Gu, W. (2022). Non-linearity between referencing behavior
and citation impact: A large-scale, discipline-level analysis. Journal of Informetrics, 16(3).
10.1016/j.joi.2022.101318
National Information Standards Organization. (2010). Guidelines for the Construction, Format,
and Management of Monolingual Controlled Vocabularies (ANSI/NISO Z39.19-2005
(R2010)). ANSI/NISO. https://www.niso.org/
OpenAlex. (n.d.). OpenAlex: End-to-end process for concept tagging. Google Docs.
https://docs.google.com/document/d/1q3jBlEexskCZaSafFDMEEY3naTeyd7GS/edit?
usp=sharing&ouid=112616748913247881031&rtpof=true&sd=true
Shen, Z., Ma, H., & Wang, K. (2018). A Web-scale system for scientific knowledge exploration.
Proceedings of ACL 2018, System Demonstrations, 87-92. 10.18653/v1/P18-4015
Xu, H., Liu, M., Bu, Y., Sun, S., Zhang, Y., Zhang, C., Acuna, D., Gray, S., Meyer, E., & Ding,
Y. (2024). The impact of heterogeneous shared leadership in scientific teams. Information
Processing & Management, 61(1). 10.1016/j.ipm.2023.103542
Zafar, L., Masood, N, & Ayaz, S. (2023). Impact of field of study (FoS) on authors' citation
trend. Scientometrics, 128(4), 2557-2576. 10.1007/s11192-023-04660-2
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Huma Zafar

This work is licensed under a Creative Commons Attribution 4.0 International License.