We compared the ability of different statistical learning techniques to predict high-risk adenoma polyps among a sample of persons undergoing a colonoscopy (n=2,364). Information on demographics, lifestyle, and medical history were obtained from a questionnaire. The following approaches were assessed: 1) ML logistic regression; 2) LASSO logistic regression 3) bagged decision tree, 4) random forest; 5) support vector machine; and 6) neural network. The data was split into a training and test set. The DeLong test was used to compare the C-statistic of each model within the test set. The highest performing model was the LASSO logistic regression model (AUC=0.67). The c-statistic of this model was not significantly different from that of the ML logistic regression model (AUC=0.67; p=0.40), random forest (AUC=0.62; p=0.15), or support vector machine (AUC=0.58; p=0.06) but was significantly better than the bagged decision tree (AUC=0.59; p=0.04) and the neural network (AUC=0.54; p=0.01).
Session
Date and Time
-
Language of Oral Presentation
English
Language of Visual Aids
English