Development and verification of prediction models for preventing cardiovascular diseases

Ji Min Sung, In Jeong Cho, David Sung, Sunhee Kim, Hyeon Chang Kim, Myeong Hun Chae, Maryam Kavousi, Oscar L. Rueda-Ochoa, M. Arfan Ikram, Oscar H. Franco, Hyuk Jae Chang

Research output: Contribution to journalArticle

Abstract

Objectives Cardiovascular disease (CVD) is one of the major causes of death worldwide. For improved accuracy of CVD prediction, risk classification was performed using national time-series health examination data. The data offers an opportunity to access deep learning (RNN-LSTM), which is widely known as an outstanding algorithm for analyzing time-series datasets. The objective of this study was to show the improved accuracy of deep learning by comparing the performance of a Cox hazard regression and RNN-LSTM based on survival analysis. Methods and findings We selected 361,239 subjects (age 40 to 79 years) with more than two health examination records from 2002–2006 using the National Health Insurance System-National Health Screening Cohort (NHIS-HEALS). The average number of health screenings (from 2002–2013) used in the analysis was 2.9 ± 1.0. Two CVD prediction models were developed from the NHIS-HEALS data: a Cox hazard regression model and a deep learning model. In an internal validation of the NHIS-HEALS dataset, the Cox regression model showed a highest time-dependent area under the curve (AUC) of 0.79 (95% CI 0.70 to 0.87) for in females and 0.75 (95% CI 0.70 to 0.80) in males at 2 years. The deep learning model showed a highest time-dependent AUC of 0.94 (95% CI 0.91 to 0.97) for in females and 0.96 (95% CI 0.95 to 0.97) in males at 2 years. Layer-wise Relevance Propagation (LRP) revealed that age was the variable that had the greatest effect on CVD, followed by systolic blood pressure (SBP) and diastolic blood pressure (DBP), in that order. Conclusion The performance of the deep learning model for predicting CVD occurrences was better than that of the Cox regression model. In addition, it was confirmed that the known risk factors shown to be important by previous clinical studies were extracted from the study results using LRP.

Original languageEnglish
Article numbere0222809
JournalPloS one
Volume14
Issue number9
DOIs
Publication statusPublished - 2019 Jan 1

Fingerprint

cardiovascular diseases
Cardiovascular Diseases
Health
Learning
Health insurance
prediction
health insurance
learning
National Health Programs
Proportional Hazards Models
Screening
Blood Pressure
screening
Blood pressure
Area Under Curve
Time series
time series analysis
Hazards
Survival Analysis
disease occurrence

All Science Journal Classification (ASJC) codes

  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)
  • General

Cite this

Sung, Ji Min ; Cho, In Jeong ; Sung, David ; Kim, Sunhee ; Kim, Hyeon Chang ; Chae, Myeong Hun ; Kavousi, Maryam ; Rueda-Ochoa, Oscar L. ; Arfan Ikram, M. ; Franco, Oscar H. ; Chang, Hyuk Jae. / Development and verification of prediction models for preventing cardiovascular diseases. In: PloS one. 2019 ; Vol. 14, No. 9.
@article{9c2b6be61a124ec1a4a95d9c19ad1a68,
title = "Development and verification of prediction models for preventing cardiovascular diseases",
abstract = "Objectives Cardiovascular disease (CVD) is one of the major causes of death worldwide. For improved accuracy of CVD prediction, risk classification was performed using national time-series health examination data. The data offers an opportunity to access deep learning (RNN-LSTM), which is widely known as an outstanding algorithm for analyzing time-series datasets. The objective of this study was to show the improved accuracy of deep learning by comparing the performance of a Cox hazard regression and RNN-LSTM based on survival analysis. Methods and findings We selected 361,239 subjects (age 40 to 79 years) with more than two health examination records from 2002–2006 using the National Health Insurance System-National Health Screening Cohort (NHIS-HEALS). The average number of health screenings (from 2002–2013) used in the analysis was 2.9 ± 1.0. Two CVD prediction models were developed from the NHIS-HEALS data: a Cox hazard regression model and a deep learning model. In an internal validation of the NHIS-HEALS dataset, the Cox regression model showed a highest time-dependent area under the curve (AUC) of 0.79 (95{\%} CI 0.70 to 0.87) for in females and 0.75 (95{\%} CI 0.70 to 0.80) in males at 2 years. The deep learning model showed a highest time-dependent AUC of 0.94 (95{\%} CI 0.91 to 0.97) for in females and 0.96 (95{\%} CI 0.95 to 0.97) in males at 2 years. Layer-wise Relevance Propagation (LRP) revealed that age was the variable that had the greatest effect on CVD, followed by systolic blood pressure (SBP) and diastolic blood pressure (DBP), in that order. Conclusion The performance of the deep learning model for predicting CVD occurrences was better than that of the Cox regression model. In addition, it was confirmed that the known risk factors shown to be important by previous clinical studies were extracted from the study results using LRP.",
author = "Sung, {Ji Min} and Cho, {In Jeong} and David Sung and Sunhee Kim and Kim, {Hyeon Chang} and Chae, {Myeong Hun} and Maryam Kavousi and Rueda-Ochoa, {Oscar L.} and {Arfan Ikram}, M. and Franco, {Oscar H.} and Chang, {Hyuk Jae}",
year = "2019",
month = "1",
day = "1",
doi = "10.1371/journal.pone.0222809",
language = "English",
volume = "14",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "9",

}

Sung, JM, Cho, IJ, Sung, D, Kim, S, Kim, HC, Chae, MH, Kavousi, M, Rueda-Ochoa, OL, Arfan Ikram, M, Franco, OH & Chang, HJ 2019, 'Development and verification of prediction models for preventing cardiovascular diseases', PloS one, vol. 14, no. 9, e0222809. https://doi.org/10.1371/journal.pone.0222809

Development and verification of prediction models for preventing cardiovascular diseases. / Sung, Ji Min; Cho, In Jeong; Sung, David; Kim, Sunhee; Kim, Hyeon Chang; Chae, Myeong Hun; Kavousi, Maryam; Rueda-Ochoa, Oscar L.; Arfan Ikram, M.; Franco, Oscar H.; Chang, Hyuk Jae.

In: PloS one, Vol. 14, No. 9, e0222809, 01.01.2019.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Development and verification of prediction models for preventing cardiovascular diseases

AU - Sung, Ji Min

AU - Cho, In Jeong

AU - Sung, David

AU - Kim, Sunhee

AU - Kim, Hyeon Chang

AU - Chae, Myeong Hun

AU - Kavousi, Maryam

AU - Rueda-Ochoa, Oscar L.

AU - Arfan Ikram, M.

AU - Franco, Oscar H.

AU - Chang, Hyuk Jae

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Objectives Cardiovascular disease (CVD) is one of the major causes of death worldwide. For improved accuracy of CVD prediction, risk classification was performed using national time-series health examination data. The data offers an opportunity to access deep learning (RNN-LSTM), which is widely known as an outstanding algorithm for analyzing time-series datasets. The objective of this study was to show the improved accuracy of deep learning by comparing the performance of a Cox hazard regression and RNN-LSTM based on survival analysis. Methods and findings We selected 361,239 subjects (age 40 to 79 years) with more than two health examination records from 2002–2006 using the National Health Insurance System-National Health Screening Cohort (NHIS-HEALS). The average number of health screenings (from 2002–2013) used in the analysis was 2.9 ± 1.0. Two CVD prediction models were developed from the NHIS-HEALS data: a Cox hazard regression model and a deep learning model. In an internal validation of the NHIS-HEALS dataset, the Cox regression model showed a highest time-dependent area under the curve (AUC) of 0.79 (95% CI 0.70 to 0.87) for in females and 0.75 (95% CI 0.70 to 0.80) in males at 2 years. The deep learning model showed a highest time-dependent AUC of 0.94 (95% CI 0.91 to 0.97) for in females and 0.96 (95% CI 0.95 to 0.97) in males at 2 years. Layer-wise Relevance Propagation (LRP) revealed that age was the variable that had the greatest effect on CVD, followed by systolic blood pressure (SBP) and diastolic blood pressure (DBP), in that order. Conclusion The performance of the deep learning model for predicting CVD occurrences was better than that of the Cox regression model. In addition, it was confirmed that the known risk factors shown to be important by previous clinical studies were extracted from the study results using LRP.

AB - Objectives Cardiovascular disease (CVD) is one of the major causes of death worldwide. For improved accuracy of CVD prediction, risk classification was performed using national time-series health examination data. The data offers an opportunity to access deep learning (RNN-LSTM), which is widely known as an outstanding algorithm for analyzing time-series datasets. The objective of this study was to show the improved accuracy of deep learning by comparing the performance of a Cox hazard regression and RNN-LSTM based on survival analysis. Methods and findings We selected 361,239 subjects (age 40 to 79 years) with more than two health examination records from 2002–2006 using the National Health Insurance System-National Health Screening Cohort (NHIS-HEALS). The average number of health screenings (from 2002–2013) used in the analysis was 2.9 ± 1.0. Two CVD prediction models were developed from the NHIS-HEALS data: a Cox hazard regression model and a deep learning model. In an internal validation of the NHIS-HEALS dataset, the Cox regression model showed a highest time-dependent area under the curve (AUC) of 0.79 (95% CI 0.70 to 0.87) for in females and 0.75 (95% CI 0.70 to 0.80) in males at 2 years. The deep learning model showed a highest time-dependent AUC of 0.94 (95% CI 0.91 to 0.97) for in females and 0.96 (95% CI 0.95 to 0.97) in males at 2 years. Layer-wise Relevance Propagation (LRP) revealed that age was the variable that had the greatest effect on CVD, followed by systolic blood pressure (SBP) and diastolic blood pressure (DBP), in that order. Conclusion The performance of the deep learning model for predicting CVD occurrences was better than that of the Cox regression model. In addition, it was confirmed that the known risk factors shown to be important by previous clinical studies were extracted from the study results using LRP.

UR - http://www.scopus.com/inward/record.url?scp=85072393667&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072393667&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0222809

DO - 10.1371/journal.pone.0222809

M3 - Article

C2 - 31536581

AN - SCOPUS:85072393667

VL - 14

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 9

M1 - e0222809

ER -