Popular GenAI Chatbots Vary in Capabilities to Answer Academic Reference Questions
DOI:
https://doi.org/10.18438/eblip30949Abstract
A Review of:
Whitfield, S., & Yang, S. Q. (2025). Evaluating AI language models for reference services: A comparative study of ChatGPT, Gemini, and Copilot. Internet Reference Services Quarterly, 29(2), 153-167. https://doi.org/10.1080/10875301.2025.2478861
Objective – To assess and compare the quality of responses to reference questions by popular generative artificial intelligent (GenAI) chatbots.
Design – Content analysis.
Setting – Web browser platforms of four GenAI chatbots.
Subjects – Responses from ChatGPT 3.5, Chat GPT 4.0, Gemini, and Copilot to a set of 28 chat reference questions submitted by Rider University Library patrons between July 1, 2023, and May 14, 2024.
Methods – Transcripts of 112 responses from the four chatbots were evaluated using a content analysis scheme adapted from a previous ChatGPT study (Yang & Mason, 2024). Both researchers independently rated every response using a 10-point scale on four categories, accuracy, relevance, friendliness, and instructiveness, and analyzed the results using inferential statistics.
Main Results – Responses from Gemini received the highest total score (592 out of a possible 1,120 points), followed by ChatGPT 4.0 (542), ChatGPT 3.5 (502), and Copilot (433). However, every chatbot fluctuated greatly in their performance in accuracy and relevance. A single-factor ANOVA test also found statistically significant differences between the quality of the GenAI chatbots’ responses in relevance and friendliness, with Gemini and Copilot performing the best in these categories respectively. There were no statistically significant differences between the chatbots’ performances in accuracy or instructiveness, although Gemini held the highest mean score for instructiveness.
Conclusion – The researchers concluded that popular GenAI chatbots should be used to supplement, not replace, the work of reference and instruction librarians, and noted the potentials for training GenAI to address basic or local, library-specific FAQs for after-hour reference support. The authors also advised librarians to stay current with the rapid development of chatbots and other GenAI tools and to continue differentiating librarianship competencies from services offered by these programs.
Downloads
References
Glynn, L. (2006). A critical appraisal tool for library and information research. Library Hi Tech, 24(3), 387–399. https://doi.org/10.1108/07378830610692154
Whitfield, S., & Yang, S. Q. (2025). Evaluating AI language models for reference services: A comparative study of ChatGPT, Gemini, and Copilot. Internet Reference Services Quarterly, 29(2), 153-167. https://doi.org/10.1080/10875301.2025.2478861
Yang, S. Q., & Mason, S. (2024). Beyond the algorithm: Understanding how ChatGPT handles complex library queries. Internet Reference Services Quarterly, 28(2), 97–151. https://doi.org/10.1080/10875301.2023.2291441
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Lisa Shen

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The Creative Commons-Attribution-Noncommercial-Share Alike License 4.0 International applies to all works published by Evidence Based Library and Information Practice. Authors will retain copyright of the work.



