Background: Machine learning (ML) is a computer algorithm used to identify patterns for prediction in various tasks, and ML methods have been beneficial for developing prediction models when applied to heterogeneous and large datasets. We aim to examine the prognostic ability of a ML-based prediction algorithm utilizing routine health checkup data to predict all-cause mortality (ACM) compared to established risk prediction approaches. Methods: A total 86155 patients with seventy available parameters (35 clinical, 32 laboratory, and 3 coronary artery calcium score [CACS] parameters) were analyzed. ML involved feature selection, splitting data randomly into a training (70%) and test set (30%), and model building with a boosted ensemble algorithm. The developed ML model was validated in a separate cohort of 4915 patients. The performance of ML for predicting ACM was compared with the following models: (i) the Framingham risk score (FRS) + CACS, (ii) atherosclerotic cardiovascular disease (ASCVD) + CACS, with (iii) logistic regression (LR) model. Results: In the derivation dataset, 690 patients died during the median 4.6-year follow-up (interquartile range, 3.0–6.6 years). The AUC value in the ML model was significantly higher than the other models in test set (ML: 0.82, FRS + CACS: 0.70, ASCVD + CACS: 0.74; LR model: 0.79, p < 0.05 for all), but not statistically significantly higher in validation set (ML: 0.78, FRS + CACS: 0.62, ASCVD + CACS: 0.72; LR model: 0.74, p: 0.572 and 0.625 for ASCVD + CACS and LR model, respectively). The ML model improved reclassification over the other models in low to intermediate risk patients (p < 0.001 for all). Conclusion: The prediction algorithm derived by ML methods showed a robust ability to predict ACM and improved reclassification over established conventional risk prediction approaches in asymptomatic population undergoing a health checkup.
Bibliographical noteFunding Information:
This work was supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIT) (No. 2018-0-00861 , Intelligent SW Technology Development for Medical Data Analysis).
All Science Journal Classification (ASJC) codes
- Radiology Nuclear Medicine and imaging
- Cardiology and Cardiovascular Medicine