Prior Steps into Knowledge Mapping: Text Mining Application and Comparison




Bibliometrics is increasingly being used by the knowledge community and librarians to easily analyze patterns in knowledge. In the field, the use of data from databases that provide bibliometric information is not always completely clean, so pre-processing is required. Several previous studies have shown that bibliometric analysis begins with a simple pre-processing step. The goal of this research is to use text mining to perform pre-processing to find the basic terms of the keywords that appear – to essentially construct a controlled vocabulary for a bibliographic dataset. The method used in this study is cleaning keywords with the stemming method using RapidMiner software. Bibliometrix was used to compare the results. A total of 85 keywords were combined into basic words. Using the built process, this study discovers differences in the network built between raw data and data that has been pre-processed, resulting in differences in the analysis that will be produced. The built process can also be reused in a variety of real-world situations.


Download data is not yet available.


Aria, M., & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959–975.

Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., & Wirth, R. (2000). CRISP-DM 1.0: Step-by-step data mining guide. SPSS.

CheshmehSohrabi, M., & Mashhadi, A. (2022). Using data mining, text mining, and bibliometric techniques to the research trends and gaps in the field of language and linguistics. Journal of Psycholinguistic Research.

Gumpenberger, C., Wieland, M., & Gorraiz, J. (2012). Bibliometric practices and activities at the University of Vienna. Library Management, 33(3), 174–183.

Han, J., Kang, H.-J., Kim, M., & Kwon, G. H. (2020). Mapping the intellectual structure of research on surgery with mixed reality: Bibliometric network analysis (2000–2019). Journal of Biomedical Informatics, 109, 103516.

Lamba, M., & Madhusudhan, M. (2018). Application of sentiment analysis in libraries to provide temporal information service: A case study on various facets of productivity. Social Network Analysis and Mining, 8(1), 63.

Li, D., Dai, F.-M., Xu, J.-J., & Jiang, M.-D. (2020). Characterizing hotspots and frontier landscapes of diabetes-specific distress from 2000 to 2018: A bibliometric study. BioMed Research International, 2020, 1–13.

Moore, M. T. (2017). Constructing a sentiment analysis model for LibQUAL+ comments. Performance Measurement and Metrics, 18(1), 78–87.

Moral-Muñoz, J. A., Herrera-Viedma, E., Santisteban-Espejo, A., & Cobo, M. J. (2020). Software tools for conducting bibliometric analysis in science: An up-to-date review. El Profesional de La Información, 29(1).

Obidat, A. H. (2022). Bibliometric analysis of global scientific literature on the accessibility of an integrated e-learning model for students with disabilities. Contemporary Educational Technology, 14(3), ep374.

Porter, M. F. (2001). Snowball: A language for stemming algorithms.

Schröer, C., Kruse, F., & Gómez, J. M. (2021). A systematic literature review on applying CRISP-DM process model. Procedia Computer Science, 181, 526–534.

Wang, X., Xu, Z., & Škare, M. (2020). A bibliometric analysis of Economic Research-Ekonomska Istraživanja (2007–2019). Economic Research-Ekonomska Istraživanja, 33(1), 865–886.

Wang, X., Xu, Z., Su, S.-F., & Zhou, W. (2021). A comprehensive bibliometric analysis of uncertain group decision making from 1980 to 2019. Information Sciences, 547, 328–353.




How to Cite

Santosa, F. A. (2023). Prior Steps into Knowledge Mapping: Text Mining Application and Comparison. Issues in Science and Technology Librarianship, (102).



Tips from the Experts
Share |