Developing a risk prediction model for head and neck cancer survival using machine learning

Authors: Yang D, Karanth SD, Wheeler M, Guo Y, Bian J, Yoon A, Braithwaite D

Category: Survivorship & Health Outcomes/Comparative Effectiveness Research
Conference Year: 2023

Abstract Body:
Purpose: Machine learning methods can be used to analyze time-to-event outcomes. This study aims to use machine learning methods to develop an alternative risk prediction model for head and neck cancer survival. Methods: We identified 99,335 patients diagnosed with primary cancer of the oral cavity, pharynx (hypopharynx, nasopharynx, oropharynx), salivary glands, nasal cavity, middle ear, and the larynx between 2006 to 2017 in the United States from Surveillance Epidemiology and End Results (SEER) database. Race, age, sex, marital status, the Yost index (an index for socioeconomic status) , treatment type, cancer type, cancer stage, and the number of cancer diagnoses were considered as risk factors. The outcome was head and neck cancer overall survival. Patients were split into a training (70%, n=79,468) set and a validation (30%, n=19,867) set. We used cox proportional hazards model as the baseline model. We then tested three traditional machine learning models for survival analysis including random survival forest, gradient boosted model, and survival support vector machine, and two deep learning methods, including DeepHit and DeepSurv. To evaluate the performance of the machine learning models, we compared these models to traditional cox proportional hazards model using the c- index. We also measured feature importance using penalized cox model. Results: In our analysis, the number of deaths in patients during follow-up was 41,572. Overall, machine learning methods outperformed the traditional cox regression (c-index=0.71) in predicting overall survival of head and neck cancer. DeepSurv had the best performance with a c-index of 0.98, followed by random survival forest (c-index=0.74), gradient boosted model (c-index=0.73), survival support vector machine (c-index=0.73), and DeepHit (c-index=0.71). Results of feature importance assessment from both random forest and penalized cox model both suggested that the most favorable variable for survival was the early stage; the most unfavored factor was old age. Conclusion: Our study suggests that machine learning methods can be used to better predict head and neck cancer survival compared with the traditional cox model based on the c-statistic.

Keywords: cancer survivorship; head and neck cancer; machine learning; epidemiology