Staten Island University Hospital Staten Island, NY
Jeffrey Loeffler, MD, Hassan Al Moussawi, MD, Chapman Wei, MD, Andrew Hunton, PA, Jean M. Chalhoub, MD, Youssef El Douaihy, MD, Sherif Andrawes, MD Staten Island University Hospital, Northwell Health, Staten Island, NY
Introduction: Diagnosing choledocholithiasis often presents as a challenge requiring accurate risk stratification and management. Artificial Intelligence tools like ChatGPT have potential to assist in clinical decision-making, but their effectiveness remains underexplored. This study aims to evaluate the accuracy of ChatGPT in risk stratification and management of choledocholithiasis. We compared ChatGPT's recommendations, both with and without the provision of ASGE guidelines for choledocholithiasis, against the actual clinical decisions made by our GI team.
Methods: We retrospectively analyzed 100 cases of choledocholithiasis, providing ChatGPT with de-identified clinical information. ChatGPT was tested in two scenarios: with and without access to the ASGE guidelines. We compared ChatGPT's risk stratification (low, intermediate, high) and management recommendations (Cholecystectomy vs MRCP vs ERCP) against the actual decisions made by the GI team.
Results: ChatGPT trained with guidelines had higher precision (87%) in risk stratification compared to the untrained version (62%), closely aligning with the GI team's assessments (R=0.726, p< 0.001 vs. R=0.471, p< 0.001). For management, trained ChatGPT's recommendations showed a lower precision (58%) compared to the untrained version (62%) (R=0.335, p< 0.001 vs. R=0.199, p=0.047). Among intermediate-risk patients who underwent ERCP, the GI physician's decisions more frequently resulted in finding a CBD stone (62.1%) than the untrained ChatGPT (41.4%). Overall, while trained ChatGPT demonstrated better alignment in risk assessment, untrained ChatGPT demonstrated slightly better management.
Discussion: Although trained ChatGPT demonstrated high precision in risk stratification, it struggled to identify certain high-risk patients with ascending cholangitis. Both versions occasionally misread clinical information, impacting risk categorization. Untrained ChatGPT's management was slightly more aligned with the GI team, although its risk stratification appeared random and lacked clear rationale. Among intermediate-risk patients who underwent ERCP, the GI physician's decisions more frequently resulted in finding a CBD stone, indicative of the crucial role of clinical acumen and experience in patient evaluation. These findings suggest that while AI tools can augment clinical decision-making, their integration should be approached cautiously, ensuring thorough validation. Further research is necessary to optimize these tools for reliable clinical application.
Disclosures:
Jeffrey Loeffler indicated no relevant financial relationships.
Hassan Al Moussawi indicated no relevant financial relationships.
Chapman Wei indicated no relevant financial relationships.
Andrew Hunton indicated no relevant financial relationships.
Jean Chalhoub indicated no relevant financial relationships.
Youssef El Douaihy indicated no relevant financial relationships.
Sherif Andrawes indicated no relevant financial relationships.
Jeffrey Loeffler, MD, Hassan Al Moussawi, MD, Chapman Wei, MD, Andrew Hunton, PA, Jean M. Chalhoub, MD, Youssef El Douaihy, MD, Sherif Andrawes, MD. P3447 - Evaluation of ChatGPT in Risk Stratification and Management of Choledocholithiasis, ACG 2024 Annual Scientific Meeting Abstracts. Philadelphia, PA: American College of Gastroenterology.