Text Analysis of Chemistry Thesis and Dissertation Titles.
DOI:
https://doi.org/10.29173/istl1700Abstract
Programmatic text analysis can be used to understand patterns and reveal trends in data that would otherwise be difficult or impossible to uncover with manual coding methods. This work uses programmatic text analysis, specifically term frequency counts, to study nearly 10,000 chemistry thesis and dissertation titles from 1911-2015. The thesis and dissertation titles were collected from nine major research universities across the southeastern United States. The libraries of all nine are members of the Association of Southeastern Research Libraries (ASERL). Text analysis scripts were written in both MATLAB and Mathematica and used to extract the most common words and phrases from the titles. Some of the most common terms appearing in chemistry thesis and dissertation titles included synthesis, spectra, reaction, application, mass spectra, and nuclear magnetic resonance. Word usage over time was studied and used to reveal general research trends in chemistry. All data, programming scripts, and instruction methods are provided openly to the community. This article will be of interest to researchers and librarians interested in text analysis and chemistry research trends. [ABSTRACT FROM AUTHOR]
Downloads
References
Banchs, R.E. 2013. Text Mining with Matlab. New York, NY: Springer.
Callon, M., Courtial, J.P. & Laville, F. 1991. Co-word analysis as a tool for describing the network of interactions between basic and technological research: The case of polymer chemistry. Scientometrics 22(1): 155-205. doi: 10.1007/bf02019280
Cantrill, S. 2015. 115 years of JACS titles [accessed April 27, 2017]. https://stuartcantrill.com/2015/06/02/115-years-of-jacs-titles/
Chrzastowski, T.E. 1991. Journal collection cost-effectiveness in an academic chemistry library: Results of a cost/use survey at the University of Illinois at Urbana-Champaign. Collection Management 14(1/2): 85-98. doi: 10.1300/J105v14n01_06
Danton, J.P. 1959. Doctoral study in librarianship in the United States. College & Research Libraries 20(6): 435-453. doi: 10.5860/crl_20_06_435
de Laet, A., Hehenkamp, J.J.J. & Wife, R.L. 2000. Finding drug candidates in virtual and lost/emerging chemistry. Journal of Heterocyclic Chemistry 37(3): 669-674. doi: 10.1002/jhet.5570370324
Downing, J., Harvey, M.J., Morgan, P.B., Murray-Rust, P., Rzepa, H.S., Stewart, D.C., Tonge, A.P. & Townsend, J.A. 2010. Spectra-t: Machine-based data extraction and semantic searching of chemistry e-theses. Journal of Chemical Information and Modeling 50(2): 251-261. doi: 10.1021/ci9003688
Finch, J.L. & Flenner, A.R. 2016. Using data visualization to examine an academic library collection. College & Research Libraries 77: 765-778. doi: 10.5860/crl.77.6.765
Finlay, C.S., Sugimoto, C.R., Daifeng Li & Russell, T.G. 2012. LIS dissertation titles and abstracts (1930-2009): Where have all the librar* gone? Library Quarterly 82(1): 29-46. doi: 10.1086/662945
Freeman, R.R. & Dyson, G.M. 1963. Development and production of Chemical Titles, a current awareness index publication prepared with the aid of a computer. Journal of Chemical Documentation 3(1): 16-20. doi: 10.1021/c160008a007
Freeman, R.R., Godfrey, J.T., Maizell, R.E., Rice, C.N. & Shepherd, W.H. 1964. Automatic preparation of selected title lists for current awareness services and as annual summaries. Journal of Chemical Documentation 4(2): 107-112. doi: 10.1021/c160013a010
Gooden, A.M. 2001. Citation analysis of chemistry doctoral dissertations: An Ohio State University case study. Issues in Science & Technology Librarianship 32(Fall 2001). doi: 10.5062/F40P0X05
Gurulingappa, H., Mudi, A., Toldo, L., Hofmann-Apitius, M. & Bhate, J. 2013. Challenges in mining the literature for chemical information. RSC Advances 3(37): 16194-16211. doi: 10.1039/C3RA40787J
Haren, S.M. 2014. Data visualization as a tool for collection assessment: Mapping the Latin American studies collection at University of California, Riverside. Library Collections Acquisitions & Technical Services 38(3-4): 70-81. doi: 10.1080/14649055.2015.1059219
He, Q. 1999. Knowledge discovery through co-word analysis. Library Trends 48(1): 133-159.
Hoffmann, K. & Doucette, L. 2012. A review of citation analysis methodologies for collection management. College & Research Libraries 73(4): 321-335. doi: 10.5860/crl-254
International Union of Pure and Applied Chemistry. IUPAC Gold Book. 2016. [accessed November 23, 2016]. https://goldbook.iupac.org/
Jamali, H.R. & Nikzad, M. 2011. Article title type and its relation with the number of downloads and citations. Scientometrics 88(2): 653-661. doi: 10.1007/s11192-011-0412-z
Keller, B. 1992. Subject content through title: A masters theses matching study at Indiana State University. Cataloging & Classification Quarterly 15(3): 69-80. doi: 10.1300/J104v15n03_05
Letchford, A., Moat, H.S. & Preis, T. 2015. The advantage of short paper titles. Royal Society Open Science 2(8): 1-6. doi: 10.1098/rsos.150266
Leydesdorff, L. 1997. Why words and co-words cannot map the development of the sciences. Journal of the American Society for Information Science 48(5): 418-427. doi: 10.1002/(SICI)1097-4571(199705)48:5<418::AID-ASI4>3.0.CO;2-Y
Link, F.E., Tosaka, Y. & Weng, C. 2015. Mining and analyzing circulation and ILL data for informed collection development. College & Research Libraries 76(6): 740-755. doi: 10.5860/crl.76.6.740
Loomis, M.E. 1985. Emerging content in nursing: An analysis of dissertation abstracts and titles: 1976-1982. Nursing Research 34(2): 113-119.
Maiti, D.C. & Dutta, B. 2013. Comparative study between words in titles and keywords of some articles on knowledge organisation. DESIDOC Journal of Library & Information Technology 33(6): 498-508.
Milojevic, S., Sugimoto, C.R., Yan, E. & Ding, Y. 2011. The cognitive structure of library and information science: Analysis of article title words. Journal of the American Society for Information Science and Technology 62(10): 1933-1953. doi: 10.1002/asi.21602
Mitchell, S. 2006. Machine assistance in collection building: New tools, research, issues, and reflections. Information Technology & Libraries 25(4): 190-216. doi: 10.6017/ital.v25i4.3353
Murphy, S.A. 2015. How data visualization supports academic library assessment. College & Research Libraries News 76(9): 482-486. http://crln.acrl.org/index.php/crlnews/article/view/9379/10545
Nagarkar, S.P. & Kumbhar, R. 2015. Text mining. Library Review 64(3): 248-262. doi: 10.1108/LR-08-2014-0091
Newberry, W.F. 1978. Subject perspective of library science dissertations. Journal of Education for Librarianship 18(3): 203-212. doi: 10.2307/40322549
Porter, M. 2006. The Porter stemming algorithm [accessed June 25, 2016]. https://tartarus.org/martin/PorterStemmer/
Rafols, I. & Leydesdorff, L. 2009. Content-based and algorithmic classifications of journals: Perspectives on the dynamics of scientific communication and indexer effects. Journal of the American Society for Information Science and Technology 60(9): 1823-1835. doi: 10.1002/asi.21086
Resnick, A. 1961. Relative effectiveness of document titles and abstracts for determining relevance of documents. Science 134(3484): 1004-1006. doi: 10.1126/science.134.3484.1004
Rodriguez, K. & Moreiro, J.A. 1996. The growth and development of research in the field of ecology - as measured by dissertation title analysis. Scientometrics 35(1): 59-70. doi: 10.1007/bf02018233
Saracevic, T. 1969. Comparative effects of titles, abstracts and full text on relevance judgments. Proceedings of the American Society for Information Science 6:(293-299.
Siguenza-Guzman, L., Saquicela, V., Avila-Ordóñez, E., Vandewalle, J. & Cattrysse, D. 2015. Literature review of data mining applications in academic libraries. Journal of Academic Librarianship 41(4): 499-510. doi: 10.1016/j.acalib.2015.06.007
Stoye, E. 2015. Forgotten synthetic PhD theses set to be given new lease of life [accessed June 26, 2016]. http://www.rsc.org/chemistryworld/2015/03/forgotten-synthetic-phd-theses-set-be-given-new-lease-life
Sudhier, K.G.P. & Kumar, V.D. 2010. Scientometric study of doctoral dissertations in biochemistry in the University of Kerala, India. Library Philosophy and Practice: 1-16. http://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1411&context=libphilprac
Sugimoto, C.R., Li, D., Russell, T.G., Finlay, S.C. & Ding, Y. 2011. The shifting sands of disciplinary development: Analyzing North American library and information science dissertations using latent Dirichlet allocation. Journal of the American Society for Information Science & Technology 62(1): 185-204. doi: 10.1002/asi.21435
Thomson Reuters. 2016. Thomson Reuters Endnote connection files [accessed June 25, 2016]. http://endnote.com/downloads/connections
Tocatlian, J.J. 1970. Are titles of chemical papers becoming more informative? Journal of the American Society for Information Science 21(5): 345-350. doi: 10.1002/asi.4630210506
Vallmitjana, N. & Sabaté, L.G. 2008. Citation analysis of Ph.D. Dissertation references as a tool for collection management in an academic chemistry library. College & Research Libraries 69(1): 72-81. doi: 10.5860/crl.69.1.72
Whitesides, G.M. 2015. Reinventing chemistry. Angewandte Chemie International Edition 54(11): 3196-3209. doi: 10.1002/anie.201410884
Wical, S.H. & Vandenbark, R.T. 2015. Combining citation studies and usage statistics to build a stronger collection. Library Resources & Technical Services 59(1): 33-42. doi: 10.5860/lrts.59n1.33
Windsor, D.A. 1971. The frequency of titles containing "dopa-words" in a complete collection of published documents on dopa (3,4-dihydroxyphenylalanine). Journal of Chemical Documentation 11(4): 227-228. doi: 10.1021/c160043a011
Xie, S., Zhang, J. & Ho, Y.-S. 2008. Assessment of world aerosol research trends by bibliometric analysis. Scientometrics 77(1): 113-130. doi: 10.1007/s11192-007-1928-0
Zhang, L. 2013. A comparison of the citation patterns of doctoral students in chemistry versus chemical engineering at Mississippi State University, 2002–2011. Science & Technology Libraries 32(3): 299-313. doi: 10.1080/0194262X.2013.791169
Zheng, B., McLean, D.C. & Lu, X. 2006. Identifying biological concepts from a protein-related corpus with a probabilistic topic model. BMC Bioinformatics 7(1): 1-10. doi: 10.1186/1471-2105-7-58
Downloads
Additional Files
Published
How to Cite
Issue
Section
License
Copyright (c) 2017 Vincent F. Scalfani
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.