Machine Learning to Predict Follow-up for Abnormal Cervical Cancer Screening using Electronic Health Record Data: Model Development and Validation

Authors: Sandi L Pruitt, Rong Lu, Jasmin A Tiro, Amy E. Hughes, Yang Xie, Guanghua Xiao, Chul Ahn, Eric Borton, Joanne Sanders, Celette Sugg Skinner

Category: Early Detection & Risk Prediction
Conference Year: 2019

Abstract Body:
PurposeWe developed and tested the accuracy of a machine learning algorithm to predict whether women with abnormal Pap smears complete follow-up colposcopy within 6 months (yes/no). MethodsUsing electronic health records (EHR) data from an urban, integrated safety-net healthcare system in Texas, we identified women with abnormal Pap smear results requiring colposcopy between 2010 and 2015. Women were included if they were ages 18-64, not pregnant, did not receive colposcopy on day of Pap smear, and had ≥1 visit prior to their Pap smear. We extracted 76 sociodemographic, clinical, healthcare utilization, residential address, and residential mobility (changes to residential address over time) variables from the EHR. We split the sample of women into training (2/3, n=3,529) and validation (1/3, n=1,764) datasets. We used recursive feature elimination with 5 repeated 10-fold cross-validation and random forest base for feature selection in the training set, then applied the best-fitting model using elastic net regression to the validation set. We measured variable importance using the normalized absolute value of coefficients. All analysis was performed in R.ResultsOf 5,293 women, 69% completed colposcopy within 6 months. Based on the training set, 73 variables were selected. Area under the curve for this model was 0.97 in the training set and 0.79 in the validation set. The most influential variables represented frequency of prior healthcare utilization, clinic and provider type at time of abnormal Pap, HIV status, and patient residential mobility. Additional post hoc analyses will determine circumstances under which our prediction model works best and will explore how influential variables relate to timely colposcopy. DiscussionCervical cancer screening can save lives, but only if positive results are followed by timely diagnostic colposcopy. Application of machine learning to EHR data accurately predicted colposcopy uptake. If implemented directly into EHR systems, healthcare systems using our approach could predict — at the time of an abnormal Pap result — who is less likely to follow up. Targeted interventions, such as patient navigation, could then be deployed for patients who need it most.

Keywords: Machine learning, prediction, cervical cancer screening follow-up