Performance of AI Chatbots in Surgical Decision-making for Gastroesophageal Reflux Disease

Large language model-linked chatbots showed varying accuracy in providing surgical management recommendations for gastroesophageal reflux disease. Google Bard had the highest accuracy, while Copilot and Perplexity had lower performance. Additional training using evidence-based health information is needed to maximize the potential of chatbots in clinical practice.