University of Chicago Medicine, Inflammatory Bowel Disease Center Chicago, IL
David T. Rubin, MD1, Walter Reinisch, MD, PhD2, Neeraj Narula, MD3, Daniel Colucci, BS4, William J. Eastman, MD5, Klaus Gottlieb, MD, PhD5, Ana Lacerda, MD, MSc6, Stephen Laroux, PhD7, Irene Modesto, MD8, Emma E. Navajas, BS4, Charles C. Owen, MD, MBA5, Yeli Wang, PhD4, Shrujal Baxi, MD9 1The University of Chicago Medicine, Chicago, IL; 2Medical University of Vienna, Vienna, Wien, Austria; 3McMaster University, Hamilton, ON, Canada; 4Iterative Health Inc., Cambridge, MA; 5Eli Lilly and Company, Indianapolis, IN; 6AbbVie, North Chicago, IL; 7AbbVie, Chicago, IL; 8Pfizer Inc., New York, NY; 9Iterative Health, Cambridge, MA
Introduction: The endoscopic Mayo Score (eMS) is utilized to provide an objective assessment of therapeutic endpoints in Ulcerative Colitis (UC) clinical trials, however inter- and intra-observer variability in assignment of eMS grades remains a challenge, even for trained central readers. Advancements in machine learning (ML) technology offer a potential solution to improve the consistency of endoscopic scoring. Methods employed in the development of these models may impact their accuracy in trials. The objective of this study is to provide a systematic review on training of ML models to generate an automated eMS grade on a full-length endoscopic procedure recording from patients with UC.
Methods: Our review includes full-length manuscripts on ML video-level eMS prediction models from human studies in UC published in English. PubMed/MEDLINE, EMBASE, and Web of Science were systematically searched on December 31, 2023, and supplemented by reference checks and Google search. Three rounds of title screening were conducted. Information on ML models and training data were extracted independently by two authors, with disparities resolved through discussion.
Results: Seven studies met criteria for inclusion. Five studies utilized endoscopic video recordings as the source data for model development (dataset size ranged from 134-1,881 videos), and two utilized still images captured from previous endoscopic procedures (both with a dataset size of 16,514 images). The final output of the model is an ordinal eMS grade (0, 1, 2, 3) in six studies, while one study generated a binary eMS grade in three ways (eMS >=1, eMS >=2, eMS >=3). Model architectures generally consisted of a few key components that contributed to the final video level output, including an informative image ML classifier (n=6), a still image eMS ML classifier (n=6), and video level eMS aggregation through statistical (n=5) or ML (n=2) techniques. Data labeling strategies to train still image and video level ML eMS classifiers varied across studies in the type of endoscopic data labeled, reading paradigm, and personnel.
Discussion: Several studies have trained ML models to assess the eMS in UC endoscopic videos. Training plays an important role in model generalizability, and variation in methodology may inform model performance. Awareness from clinicians, regulators, investigators, and industry on how these models are built will make them more adept in selecting the most appropriate models for future studies.
Figure: Figure 1. Overview of eMS labels applied to endoscopic data for model training. Labels are provided through a human annotation workflow with various paradigms described below. (A) eMS labels for training of image classification ML models (applied in six studies). (B) eMS labels for training of video classification ML models (applied in two studies). Studies in italics indicate the use of weak labels where a higher-level label is automatically assigned at a more granular level, such as automatically assigning an image a label based on the label at the video level.
Note: The table for this abstract can be viewed in the ePoster Gallery section of the ACG 2024 ePoster Site or in The American Journal of Gastroenterology's abstract supplement issue, both of which will be available starting October 27, 2024.
Emma Navajas: Iterative Health, Inc. – Employee, Stock Options.
Charles C. Owen: Eli Lilly and Company – Employee, Stock Options.
Yeli Wang: Iterative Health, Inc. – Employee, Stock Options.
Shrujal Baxi: Iterative Health, Inc. – Employee, Stock Options.
David T. Rubin, MD1, Walter Reinisch, MD, PhD2, Neeraj Narula, MD3, Daniel Colucci, BS4, William J. Eastman, MD5, Klaus Gottlieb, MD, PhD5, Ana Lacerda, MD, MSc6, Stephen Laroux, PhD7, Irene Modesto, MD8, Emma E. Navajas, BS4, Charles C. Owen, MD, MBA5, Yeli Wang, PhD4, Shrujal Baxi, MD9. P0655 - Training Machine Learning Models for the Assessment of the Endoscopic Mayo Score in Ulcerative Colitis: A Systematic Review, ACG 2024 Annual Scientific Meeting Abstracts. Philadelphia, PA: American College of Gastroenterology.