Large language models may enhance consensus-building on ventral rectopexy but vary significantly in reliability.
- Openevidence had the highest content appropriateness score (3.5/5), outperforming Gemini (3.0/5) and ChatGPT (2.8/5; p < 0.001).
- ChatGPT fabricated 53% of citations compared to Gemini’s 12% and Openevidence’s 0% (p < 0.001).
This suggests a need for careful selection of tools in clinical discussions.
- Only 34% of Openevidence citations were lower than level I-III studies, making it a trusted source for guidelines.
Journal Article by Lee FG, Larson EL (…) Perry WRG et 12 al. in Dis Colon Rectum
Copyright © The ASCRS 2026.
