Exploring a Machine-Generated Concept Hierarchy Through the Lens of "Naive" Classification

Authors

DOI:

https://doi.org/10.29173/cais1978

Abstract

Scholarly communication research often relies on comprehensive subject classifications to evaluate research produced within or across disciplines. Such use of classification systems is less related to information retrieval and more aligned with the types of knowledge discovery tasks described by Beghtol (2003) in her discussion of naive classification. In this thesis project in progress, we investigate the machine learning processes used to generate the Microsoft Academic Graph and OpenAlex subject classification systems to better understand how this classification supports knowledge discovery in a research evaluation context, and in what ways it might be made more effective in that context.

References

Beghtol, C. (2003). Classification for information retrieval and classification for knowledge

discovery: Relationships between “professional” and “naïve” classifications. Knowledge

Organization, 30(2). 64-73.

Golub, K. (2021). Automated Subject Indexing: An Overview. Cataloging & Classification

Quarterly, 59(8), 702–719. https://doi.org/10.1080/01639374.2021.2012311

Huang, Y., Lu, W., Liu, J., Cheng, Q., & Bu, Y. (2022). Towards transdisciplinary impact of

scientific publications: A longitudinal, comprehensive, and large-scale analysis on Microsoft

Academic Graph. Information Processing & Management, 59(2). 10.1016/j.ipm.2021.102859

Liu, J., Chen, H., Liu, Z., Bu, Y., & Gu, W. (2022). Non-linearity between referencing behavior

and citation impact: A large-scale, discipline-level analysis. Journal of Informetrics, 16(3).

10.1016/j.joi.2022.101318

National Information Standards Organization. (2010). Guidelines for the Construction, Format,

and Management of Monolingual Controlled Vocabularies (ANSI/NISO Z39.19-2005

(R2010)). ANSI/NISO. https://www.niso.org/

OpenAlex. (n.d.). OpenAlex: End-to-end process for concept tagging. Google Docs.

https://docs.google.com/document/d/1q3jBlEexskCZaSafFDMEEY3naTeyd7GS/edit?

usp=sharing&ouid=112616748913247881031&rtpof=true&sd=true

Shen, Z., Ma, H., & Wang, K. (2018). A Web-scale system for scientific knowledge exploration.

Proceedings of ACL 2018, System Demonstrations, 87-92. 10.18653/v1/P18-4015

Xu, H., Liu, M., Bu, Y., Sun, S., Zhang, Y., Zhang, C., Acuna, D., Gray, S., Meyer, E., & Ding,

Y. (2024). The impact of heterogeneous shared leadership in scientific teams. Information

Processing & Management, 61(1). 10.1016/j.ipm.2023.103542

Zafar, L., Masood, N, & Ayaz, S. (2023). Impact of field of study (FoS) on authors' citation

trend. Scientometrics, 128(4), 2557-2576. 10.1007/s11192-023-04660-2

Downloads

Published

2025-02-07

How to Cite

Zafar, H. (2025). Exploring a Machine-Generated Concept Hierarchy Through the Lens of "Naive" Classification. Proceedings of the Annual Conference of CAIS Actes Du congrès Annuel De l’ACSI. https://doi.org/10.29173/cais1978

Issue

Section

Articles