Text Analysis of Chemistry Thesis and Dissertation Titles.

Authors

  • Vincent F. Scalfani

DOI:

https://doi.org/10.29173/istl1700

Abstract

Programmatic text analysis can be used to understand patterns and reveal trends in data that would otherwise be difficult or impossible to uncover with manual coding methods. This work uses programmatic text analysis, specifically term frequency counts, to study nearly 10,000 chemistry thesis and dissertation titles from 1911-2015. The thesis and dissertation titles were collected from nine major research universities across the southeastern United States. The libraries of all nine are members of the Association of Southeastern Research Libraries (ASERL). Text analysis scripts were written in both MATLAB and Mathematica and used to extract the most common words and phrases from the titles. Some of the most common terms appearing in chemistry thesis and dissertation titles included synthesis, spectra, reaction, application, mass spectra, and nuclear magnetic resonance. Word usage over time was studied and used to reveal general research trends in chemistry. All data, programming scripts, and instruction methods are provided openly to the community. This article will be of interest to researchers and librarians interested in text analysis and chemistry research trends. [ABSTRACT FROM AUTHOR]

Downloads

Download data is not yet available.

References

Banchs, R.E. 2013. Text Mining with Matlab. New York, NY: Springer.

Callon, M., Courtial, J.P. & Laville, F. 1991. Co-word analysis as a tool for describing the network of interactions between basic and technological research: The case of polymer chemistry. Scientometrics 22(1): 155-205. doi: 10.1007/bf02019280

Cantrill, S. 2015. 115 years of JACS titles [accessed April 27, 2017]. https://stuartcantrill.com/2015/06/02/115-years-of-jacs-titles/

Chrzastowski, T.E. 1991. Journal collection cost-effectiveness in an academic chemistry library: Results of a cost/use survey at the University of Illinois at Urbana-Champaign. Collection Management 14(1/2): 85-98. doi: 10.1300/J105v14n01_06

Danton, J.P. 1959. Doctoral study in librarianship in the United States. College & Research Libraries 20(6): 435-453. doi: 10.5860/crl_20_06_435

de Laet, A., Hehenkamp, J.J.J. & Wife, R.L. 2000. Finding drug candidates in virtual and lost/emerging chemistry. Journal of Heterocyclic Chemistry 37(3): 669-674. doi: 10.1002/jhet.5570370324

Downing, J., Harvey, M.J., Morgan, P.B., Murray-Rust, P., Rzepa, H.S., Stewart, D.C., Tonge, A.P. & Townsend, J.A. 2010. Spectra-t: Machine-based data extraction and semantic searching of chemistry e-theses. Journal of Chemical Information and Modeling 50(2): 251-261. doi: 10.1021/ci9003688

Finch, J.L. & Flenner, A.R. 2016. Using data visualization to examine an academic library collection. College & Research Libraries 77: 765-778. doi: 10.5860/crl.77.6.765

Finlay, C.S., Sugimoto, C.R., Daifeng Li & Russell, T.G. 2012. LIS dissertation titles and abstracts (1930-2009): Where have all the librar* gone? Library Quarterly 82(1): 29-46. doi: 10.1086/662945

Freeman, R.R. & Dyson, G.M. 1963. Development and production of Chemical Titles, a current awareness index publication prepared with the aid of a computer. Journal of Chemical Documentation 3(1): 16-20. doi: 10.1021/c160008a007

Freeman, R.R., Godfrey, J.T., Maizell, R.E., Rice, C.N. & Shepherd, W.H. 1964. Automatic preparation of selected title lists for current awareness services and as annual summaries. Journal of Chemical Documentation 4(2): 107-112. doi: 10.1021/c160013a010

Gooden, A.M. 2001. Citation analysis of chemistry doctoral dissertations: An Ohio State University case study. Issues in Science & Technology Librarianship 32(Fall 2001). doi: 10.5062/F40P0X05

Gurulingappa, H., Mudi, A., Toldo, L., Hofmann-Apitius, M. & Bhate, J. 2013. Challenges in mining the literature for chemical information. RSC Advances 3(37): 16194-16211. doi: 10.1039/C3RA40787J

Haren, S.M. 2014. Data visualization as a tool for collection assessment: Mapping the Latin American studies collection at University of California, Riverside. Library Collections Acquisitions & Technical Services 38(3-4): 70-81. doi: 10.1080/14649055.2015.1059219

He, Q. 1999. Knowledge discovery through co-word analysis. Library Trends 48(1): 133-159.

Hoffmann, K. & Doucette, L. 2012. A review of citation analysis methodologies for collection management. College & Research Libraries 73(4): 321-335. doi: 10.5860/crl-254

International Union of Pure and Applied Chemistry. IUPAC Gold Book. 2016. [accessed November 23, 2016]. https://goldbook.iupac.org/

Jamali, H.R. & Nikzad, M. 2011. Article title type and its relation with the number of downloads and citations. Scientometrics 88(2): 653-661. doi: 10.1007/s11192-011-0412-z

Keller, B. 1992. Subject content through title: A masters theses matching study at Indiana State University. Cataloging & Classification Quarterly 15(3): 69-80. doi: 10.1300/J104v15n03_05

Letchford, A., Moat, H.S. & Preis, T. 2015. The advantage of short paper titles. Royal Society Open Science 2(8): 1-6. doi: 10.1098/rsos.150266

Leydesdorff, L. 1997. Why words and co-words cannot map the development of the sciences. Journal of the American Society for Information Science 48(5): 418-427. doi: 10.1002/(SICI)1097-4571(199705)48:5<418::AID-ASI4>3.0.CO;2-Y

Link, F.E., Tosaka, Y. & Weng, C. 2015. Mining and analyzing circulation and ILL data for informed collection development. College & Research Libraries 76(6): 740-755. doi: 10.5860/crl.76.6.740

Loomis, M.E. 1985. Emerging content in nursing: An analysis of dissertation abstracts and titles: 1976-1982. Nursing Research 34(2): 113-119.

Maiti, D.C. & Dutta, B. 2013. Comparative study between words in titles and keywords of some articles on knowledge organisation. DESIDOC Journal of Library & Information Technology 33(6): 498-508.

Milojevic, S., Sugimoto, C.R., Yan, E. & Ding, Y. 2011. The cognitive structure of library and information science: Analysis of article title words. Journal of the American Society for Information Science and Technology 62(10): 1933-1953. doi: 10.1002/asi.21602

Mitchell, S. 2006. Machine assistance in collection building: New tools, research, issues, and reflections. Information Technology & Libraries 25(4): 190-216. doi: 10.6017/ital.v25i4.3353

Murphy, S.A. 2015. How data visualization supports academic library assessment. College & Research Libraries News 76(9): 482-486. http://crln.acrl.org/index.php/crlnews/article/view/9379/10545

Nagarkar, S.P. & Kumbhar, R. 2015. Text mining. Library Review 64(3): 248-262. doi: 10.1108/LR-08-2014-0091

Newberry, W.F. 1978. Subject perspective of library science dissertations. Journal of Education for Librarianship 18(3): 203-212. doi: 10.2307/40322549

Porter, M. 2006. The Porter stemming algorithm [accessed June 25, 2016]. https://tartarus.org/martin/PorterStemmer/

Rafols, I. & Leydesdorff, L. 2009. Content-based and algorithmic classifications of journals: Perspectives on the dynamics of scientific communication and indexer effects. Journal of the American Society for Information Science and Technology 60(9): 1823-1835. doi: 10.1002/asi.21086

Resnick, A. 1961. Relative effectiveness of document titles and abstracts for determining relevance of documents. Science 134(3484): 1004-1006. doi: 10.1126/science.134.3484.1004

Rodriguez, K. & Moreiro, J.A. 1996. The growth and development of research in the field of ecology - as measured by dissertation title analysis. Scientometrics 35(1): 59-70. doi: 10.1007/bf02018233

Saracevic, T. 1969. Comparative effects of titles, abstracts and full text on relevance judgments. Proceedings of the American Society for Information Science 6:(293-299.

Siguenza-Guzman, L., Saquicela, V., Avila-Ordóñez, E., Vandewalle, J. & Cattrysse, D. 2015. Literature review of data mining applications in academic libraries. Journal of Academic Librarianship 41(4): 499-510. doi: 10.1016/j.acalib.2015.06.007

Stoye, E. 2015. Forgotten synthetic PhD theses set to be given new lease of life [accessed June 26, 2016]. http://www.rsc.org/chemistryworld/2015/03/forgotten-synthetic-phd-theses-set-be-given-new-lease-life

Sudhier, K.G.P. & Kumar, V.D. 2010. Scientometric study of doctoral dissertations in biochemistry in the University of Kerala, India. Library Philosophy and Practice: 1-16. http://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1411&context=libphilprac

Sugimoto, C.R., Li, D., Russell, T.G., Finlay, S.C. & Ding, Y. 2011. The shifting sands of disciplinary development: Analyzing North American library and information science dissertations using latent Dirichlet allocation. Journal of the American Society for Information Science & Technology 62(1): 185-204. doi: 10.1002/asi.21435

Thomson Reuters. 2016. Thomson Reuters Endnote connection files [accessed June 25, 2016]. http://endnote.com/downloads/connections

Tocatlian, J.J. 1970. Are titles of chemical papers becoming more informative? Journal of the American Society for Information Science 21(5): 345-350. doi: 10.1002/asi.4630210506

Vallmitjana, N. & Sabaté, L.G. 2008. Citation analysis of Ph.D. Dissertation references as a tool for collection management in an academic chemistry library. College & Research Libraries 69(1): 72-81. doi: 10.5860/crl.69.1.72

Whitesides, G.M. 2015. Reinventing chemistry. Angewandte Chemie International Edition 54(11): 3196-3209. doi: 10.1002/anie.201410884

Wical, S.H. & Vandenbark, R.T. 2015. Combining citation studies and usage statistics to build a stronger collection. Library Resources & Technical Services 59(1): 33-42. doi: 10.5860/lrts.59n1.33

Windsor, D.A. 1971. The frequency of titles containing "dopa-words" in a complete collection of published documents on dopa (3,4-dihydroxyphenylalanine). Journal of Chemical Documentation 11(4): 227-228. doi: 10.1021/c160043a011

Xie, S., Zhang, J. & Ho, Y.-S. 2008. Assessment of world aerosol research trends by bibliometric analysis. Scientometrics 77(1): 113-130. doi: 10.1007/s11192-007-1928-0

Zhang, L. 2013. A comparison of the citation patterns of doctoral students in chemistry versus chemical engineering at Mississippi State University, 2002–2011. Science & Technology Libraries 32(3): 299-313. doi: 10.1080/0194262X.2013.791169

Zheng, B., McLean, D.C. & Lu, X. 2006. Identifying biological concepts from a protein-related corpus with a probabilistic topic model. BMC Bioinformatics 7(1): 1-10. doi: 10.1186/1471-2105-7-58

Downloads

Additional Files

Published

2017-06-01

How to Cite

Scalfani, V. F. (2017). Text Analysis of Chemistry Thesis and Dissertation Titles. Issues in Science and Technology Librarianship, (86). https://doi.org/10.29173/istl1700

Issue

Section

Refereed Articles
Share |