Entrez and BLAST: Precision and Recall in Searches of NCBI Databases.
DOI:
https://doi.org/10.29173/istl2431Abstract
This project analyzes the results of searches for genes and proteins in the NCBI databases Gene, RefSeq RNA and RefSeq Protein. Corresponding searches were performed using the search programs Entrez and BLAST, and search recall and precision were calculated. The findings demonstrate the different types of result sets that can be expected from using different search programs and settings. Also, some unexpected results indicate that the default search settings are not optimal for all searches; an important aspect of searching which information professionals should remember and communicate to researchers [ABSTRACT FROM AUTHOR]
Downloads
References
ACRL Information Literacy Standards for Science and Engineering/Technology. 2006. [Online]. Available: {http://www.ala.org/ala/mgrps/divs/acrl/standards/infolitscitech.cfm} [Accessed August 6, 2007].
Alpi, K. 2003. Bioinformatics training by librarians and for librarians: developing the skills needed to support molecular biology and clinical genetics information instruction. Issues in Science and Technology Librarianship 37 [Online]. Available: http://www.istl.org/03-spring/article1.html [Accessed November 1, 2007].
Altschul, S. F., et al. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25(17): 3389-3402.
BLAST Program Selection Guide. 2007. [Online]. Available: http://www.ncbi.nlm.nih.gov/blast/producttable.shtml [Accessed November 1, 2007].
Chattopadhyay, A., et al. 2006. Design and implementation of a library-based information service in molecular biology and genetics at the University of Pittsburgh. Journal of the Medical Library Association 94(3):307-313.
Geer, R. C. 2006. Broad issues to consider for library involvement in bioinformatics. Journal of the Medical Library Association 94(3):286-298.
Henikoff, S. and Henikoff, J. G. 1992. Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences of the United States of America 89:10915-10919.
Hesse, M., Magin, T. M. and Weber, K. 2001. Genes for intermediate filament proteins and the draft sequence for the human genome: novel keratin genes and a surprisingly high number of pseudogenes related to keratin genes 8 and 18. Journal of Cell Science 114: 2569-2575.
Maglott, D., et al. 2007. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Research 35(Database Issue):D5-D12.
Messersmith, D. J. et al. 2006. A Web-based assessment of bioinformatics end-user support services at US universities. Journal of the Medical Library Association 94(3):299-305.
McGinnis, S. and Madden, T. L. 2004. BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Research 32(Web Server Issue):W20-W25.
NCBI. A Science primer. Retreived June 5, 2007 from http://www.ncbi.nlm.nih.gov/About/primer/genetics_genome.html.
NCBI. The Statistics of Sequence Similarity Scores. Retrieved Aug. 6, 2007 from http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html.
Pruitt, K. D., Tatusova, T. and Maglott, D. R. 2007. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Research 35(Database Issue):D61-D65.
Schweizer, J. et al. 2006. New consensus nomenclature for mammalian keratins. Journal of Cell Biology 174(2): 169-174.
Downloads
Published
How to Cite
Issue
Section
License
While ISTL has always been open access and authors have always retained the copyright of their papers without restrictions, articles in issues prior to no.75 were not licensed with Creative Commons licenses. Since issue no. 75 (Winter 2014), ISTL has licensed its work through Creative Commons licenses. Please refer to the Copyright and Licensing Information page for more information.


