Anthony Kerbage, MD1, Tarek Souaid, MD, MPH1, Carole Macaron, MD2, Carol A.. Burke, MD, FACG2, Carol Rouphael, MD2 1Cleveland Clinic, Cleveland, OH; 2Digestive Disease Institute, Cleveland Clinic, Cleveland, OH
Introduction: Adequate bowel preparation (BP) is essential for effective colonoscopy. Inadequate BP has been linked to a lower adenoma detection rate and an increased incidence of post-colonoscopy colorectal cancer. Assessment of BP quality is subjective and its grading may vary between providers. The use of artificial intelligence (AI) in endoscopy may offer an objective assessment of BP adequacy. We compare the accuracy of BP assessment of Generative Pre-trained Transformer (GPT)-4o and Google Gemini Pro 1.5, two large language models (LLMs) with recent advancements that include image recognition.
Methods: Endoscopic images were retrieved from HyperKvasir, an open-source gastrointestinal (GI) repository. The database included images that were pre-grouped into 2 categories based on the Boston Bowel Preparation Scale (BBPS) for the colonic segment shown in the image: a BBPS score of 0 or 1, which we considered inadequate, and a BBPS score of 2 or 3, which we considered adequate. A standardized, close-ended prompt instructed the GPT-o and Google Gemini Pro 1.5 to act as endoscopists and provide the BBPS for the colon segment visualized. 25 images from each category were randomly introduced into each LLM. The primary outcome measure was the accuracy of each model in identifying BP quality.
Results: GPT-4o’s accuracy in identifying adequate BP images was 88%, 92% for inadequate BP images, for an overall accuracy of 90%. In contrast, Google Gemini Pro 1.5 demonstrated an accuracy of 28% for adequate BP images, 76% for inadequate BP images, and an overall accuracy of 52%.
Discussion: GPT-o outperformed Google Gemini Pro 1.5 in assessing BP quality, showing its potential in supporting documentation in colonoscopy. With additional training and deep learning, future collaborations between ChatGPT, healthcare, and GI technology companies could lead to such LLMs be integrated into GI endoscopy for real-time, automated, and objective BP quality documentation, leading to less variability in recommending the optimal interval for the next colonoscopy.
Figure: Figure 1. Example of a Prompt and Output for Each Model
Disclosures:
Anthony Kerbage indicated no relevant financial relationships.
Tarek Souaid indicated no relevant financial relationships.
Carole Macaron indicated no relevant financial relationships.