Large language model GPT-4, specifically the chatbot version ChatGPT, demonstrates near or above human-level performance on surgical knowledge assessments from two widely used question banks. In open-ended questions, ChatGPT answers accurately to 47.9% and 66.1% of questions, while for multiple-choice questions, it achieves accuracy rates of 71.3% and 67.9%. However, inconsistencies in responses on repeat queries raise concerns regarding the safe and reliable application of large language models like ChatGPT in the clinical setting.
Journal Article by Beaulieu-Jones BR, Berrigan MT (…) Brat GA et 3 al. in BMC Surg
Copyright © 2023 Elsevier Inc. All rights reserved.
