Popular GenAI Chatbots Vary in Capabilities to Answer Academic Reference Questions

Authors

  • Lisa Shen Sam Houston State University, Huntsville, Texas, United States of America  

DOI:

https://doi.org/10.18438/eblip30949

Abstract

A Review of: 

Whitfield, S., & Yang, S. Q. (2025). Evaluating AI language models for reference services: A comparative study of ChatGPT, Gemini, and Copilot. Internet Reference Services Quarterly, 29(2), 153-167. https://doi.org/10.1080/10875301.2025.2478861  

Objective  To assess and compare the quality of responses to reference questions by popular generative artificial intelligent (GenAI) chatbots. 

Design – Content analysis. 

Setting – Web browser platforms of four GenAI chatbots.  

Subjects – Responses from ChatGPT 3.5, Chat GPT 4.0, Gemini, and Copilot to a set of 28 chat reference questions submitted by Rider University Library patrons between July 1, 2023, and May 14, 2024. 

Methods  Transcripts of 112 responses from the four chatbots were evaluated using a content analysis scheme adapted from a previous ChatGPT study (Yang & Mason, 2024). Both researchers independently rated every response using a 10-point scale on four categories, accuracy, relevance, friendliness, and instructiveness, and analyzed the results using inferential statistics. 

Main Results  Responses from Gemini received the highest total score (592 out of a possible 1,120 points), followed by ChatGPT 4.0 (542), ChatGPT 3.5 (502), and Copilot (433). However, every chatbot fluctuated greatly in their performance in accuracy and relevance. A single-factor ANOVA test also found statistically significant differences between the quality of the GenAI chatbots’ responses in relevance and friendliness, with Gemini and Copilot performing the best in these categories respectively. There were no statistically significant differences between the chatbots’ performances in accuracy or instructiveness, although Gemini held the highest mean score for instructiveness. 

Conclusion – The researchers concluded that popular GenAI chatbots should be used to supplement, not replace, the work of reference and instruction librarians, and noted the potentials for training GenAI to address basic or local, library-specific FAQs for after-hour reference support. The authors also advised librarians to stay current with the rapid development of chatbots and other GenAI tools and to continue differentiating librarianship competencies from services offered by these programs. 

Downloads

Download data is not yet available.

References

Glynn, L. (2006). A critical appraisal tool for library and information research. Library Hi Tech, 24(3), 387–399. https://doi.org/10.1108/07378830610692154

Whitfield, S., & Yang, S. Q. (2025). Evaluating AI language models for reference services: A comparative study of ChatGPT, Gemini, and Copilot. Internet Reference Services Quarterly, 29(2), 153-167. https://doi.org/10.1080/10875301.2025.2478861

Yang, S. Q., & Mason, S. (2024). Beyond the algorithm: Understanding how ChatGPT handles complex library queries. Internet Reference Services Quarterly, 28(2), 97–151. https://doi.org/10.1080/10875301.2023.2291441

Downloads

Published

2026-03-16

How to Cite

Shen, L. (2026). Popular GenAI Chatbots Vary in Capabilities to Answer Academic Reference Questions . Evidence Based Library and Information Practice, 21(1), 206–208. https://doi.org/10.18438/eblip30949

Issue

Section

Evidence Summaries