Djibril M. Ba, PhD, Alireza V. Sadr, MPH, Nazia Raja-Khan, MD, Yue Zhang, MPH, Ayesha Siddiqui, MD, Jennifer Maranki, MD, Vernon M. Chinchilli, PhD, Vida Abedi, PhD Penn State College of Medicine, Hershey, PA
Introduction: Globally, the incidence of diabetes rates has risen to epidemic levels, with considerable impact on both patient-centered and public health. Almost one-third of patients with pancreatitis will develop diabetes within 3 years, but it is unclear whether the risk is partially due to different risk factors. This study aimed to build for the first time, an effective machine learning predictive model with high sensitivity and selectivity to identify pancreatitis patients at risk of developing new-onset diabetes based on patient key clinical variables using large real-world data.
Methods: We conducted a real-world evidence study using the TriNetX Research Network database (2017–2024), identifying patients with acute and chronic pancreatitis (diagnosed using International Classification of Diseases, Tenth Revision, Clinical Modification codes: K85, K86.0, K86.1) with no diabetes diagnosis at the study baseline. Missing values were imputed using eXtreme Gradient Boosting (XGBoost), and 1:2 matching methods were used to address data imbalances. XGBoost was used to build machine-learning algorithms for predicting the new onset of diabetes during up to 7 years of follow-up after pancreatitis. We further identified important clinical features and different optimization strategies.
Results: We included 7,059 patients with a diagnosis of pancreatitis, of which 2,353 (3.9%) had new onset of diabetes. After training and testing all the machine models, the proposed prediction provided the best result with the XGBoost model with an Area Under Curve (AUC) of 0.57. After examining the relative importance of the 59 features considered in our preliminary findings, factors such as age, height, length of stay (LOS), blood pressure, and laboratory-based features (such as alanine aminotransferase (ALT), aspartate aminotransferase (AST), blood urea nitrogen (BUN), hemoglobin, hematocrit, white blood cells, and platelets) appeared to have a higher impact on the model’s predictions (Figure 1).
Discussion: In this first large real-world study using electronic health records, we used the machine learning model XGBoost to predict new onset of diabetes following pancreatitis. factors such as age, height, LOS, blood pressure, and several laboratory-based appeared to have a higher impact on the model’s predictions. The latter could be targeted for personalized interventions and can be used to develop a potential smartphone application to input features and predict new onset of diabetes after pancreatitis instantaneously.
Figure: Importance of the 59 features using XGBoost
Disclosures:
Djibril Ba indicated no relevant financial relationships.
Alireza Sadr indicated no relevant financial relationships.
Nazia Raja-Khan indicated no relevant financial relationships.
Yue Zhang indicated no relevant financial relationships.
Ayesha Siddiqui indicated no relevant financial relationships.
Jennifer Maranki indicated no relevant financial relationships.
Vernon Chinchilli indicated no relevant financial relationships.
Vida Abedi indicated no relevant financial relationships.
Djibril M. Ba, PhD, Alireza V. Sadr, MPH, Nazia Raja-Khan, MD, Yue Zhang, MPH, Ayesha Siddiqui, MD, Jennifer Maranki, MD, Vernon M. Chinchilli, PhD, Vida Abedi, PhD. P3452 - Prediction of New-Onset Diabetes Following Acute and Chronic Pancreatitis Using Machine Learning Algorithms: Analysis of Large Real-World Data, ACG 2024 Annual Scientific Meeting Abstracts. Philadelphia, PA: American College of Gastroenterology.