Optometry 360 | AI tools vary in performance on pediatric ophthalmology questions

AI tools vary in performance on pediatric ophthalmology questions

Posted on

Microsoft Copilot outperformed ChatGPT and Google Gemini in both accuracy and readability when answering pediatric ophthalmology questions, but all 3 AI tools should be used cautiously due to the potential for incorrect information, according to a study.

The study compared the accuracy and readability of ChatGPT (OpenAI), Google Gemini (Alphabet), and Microsoft Copilot when answering pediatric ophthalmology questions. Each chatbot was given 100 multiple-choice questions from the Ophtho-Questions database, a resource commonly used for board exam preparation.

Microsoft Copilot answered 74% of questions correctly, compared to 61% for ChatGPT and 60% for Gemini. The difference was statistically significant. Copilot also scored highest in readability based on 3 standard metrics, suggesting it delivered more user-friendly explanations.

Researchers concluded that while these AI tools may assist in learning, their occasional inaccuracies mean responses should be interpreted with caution.

Reference
Bahar TS, Öcal O, Çetinkaya Yaprak A. Comparison of ChatGPT-4, Microsoft Copilot, and Google Gemini for Pediatric Ophthalmology Questions. J Pediatr Ophthalmol Strabismus. 2025;doi: 10.3928/01913913-20250404-03. Epub ahead of print. PMID: 40423505.

AI tools vary in performance on pediatric ophthalmology questions

Contact Info

Social Links