Enhancing the Discovery of Chemistry Theses by Registering Substances and Depositing in PubChem


  • Vincent F. Scalfani The University of Alabama
  • Barbara J. Dahlbach The University of Alabama
  • Jacob Robertson The University of Alabama https://orcid.org/0000-0001-6356-9585




PubChem, chemical information, data sharing, theses and dissertations


Chemical substances from theses are not widely accessible as searchable machine-readable formats. In this article, we describe our workflow for extracting, registering, and sharing chemical substances from the University of Alabama theses to enhance discovery. In total, 73 theses were selected for the project, resulting in about 3,000 substances registered using the IUPAC International Chemical Identifier and deposited in PubChem as either structure-data files or Simplified Molecular-Input Line-Entry System notations. In addition to substances being deposited in PubChem, an archive copy was also deposited in the University of Alabama Institutional Repository. The PubChem records for the substance depositions include the full bibliographic reference and link to the thesis full text or thesis metadata when the full text is not yet available. Excluding mixtures, we found that 40% of the shared substances were new to PubChem at the time of deposition. We conclude this article with a detailed discussion about our experiences, challenges, and recommendations for librarians and curators engaged in sharing chemical substance data from theses and similar documents.


Download data is not yet available.


How to Cite

Scalfani, V. F., Dahlbach, B. J., & Robertson, J. (2021). Enhancing the Discovery of Chemistry Theses by Registering Substances and Depositing in PubChem. Issues in Science and Technology Librarianship, (97). https://doi.org/10.29173/istl2566



