Prediction of Ovarian Cancer Survival using Machine Learning: A Population-Based Study
Author(s): Munetoshi Akazawa, Kazunori Hashimoto
Objective: Accurate prediction could lead to risk stratification of patients and can be used as a decision-making tool for adjuvant chemotherapy. This study aimed to predict the prognosis of ovarian cancer using machine learning models.
Materials and Methods: We included patients with epithelial ovarian cancer between 2004 and 2019 extracted from Surveillance, Epidemiology, and End Results (SEER) database. We predicted the 5-year overall survival of patients using 12 clinic-pathological variables. Two machine learning models including gradient boosting machine (XGBoost) and artificial neural network were compared with traditional logistic regression. After 5-fold cross validation, we evaluated the model performance using classification accuracy and area under the curve (AUC) of the receiver operation curve. The importance of the variables in the construction of the prediction models was evaluated.
Results: A total of 18,438 patients were included in the study. Among three prediction models, XGBoost exhibited the best performance, followed by artificial neural network and logistic regression. XGBoost achieved a class accuracy of 0.809 (95%CI: 0.807–0.810) and AUC of 0.808 (95%CI: 0.806–0.809). The class accuracy and AUC were 0.802 (95%CI: 0.794–0.809) and 0.797 (95%CI: 0.784–0.808) in artificial neural network, 0.791 (95%CI: 0.787–0.794) and 0.784 (95%CI: 0.780–0.786) in logistic regression, respectively. In the XGBoost model, the most important variables were “summary stage,” followed by “year of diagnosis” and “M classification”.
Conclusion: Using machine learning, we were able to predict the prognosis of ovarian cancer. The machine learning model showed better prediction performance than the logistic regression models.