Prediction model development of late-onset preeclampsia using machine learning-based methods

Jong Hyun Jhee, Sunghee Lee, Yejin Park, Sang Eun Lee, Young Ah Kim, Shin-Wook Kang, Ja Young Kwon, Jung Tak Park

Research output: Contribution to journalArticle

Abstract

Preeclampsia is one of the leading causes of maternal and fetal morbidity and mortality. Due to the lack of effective preventive measures, its prediction is essential to its prompt management. This study aimed to develop models using machine learning to predict late-onset preeclampsia using hospital electronic medical record data. The performance of the machine learning based models and models using conventional statistical methods were also compared. A total of 11,006 pregnant women who received antenatal care at Yonsei University Hospital were included. Maternal data were retrieved from electronic medical records during the early second trimester to 34 weeks. The prediction outcome was late-onset preeclampsia occurrence after 34 weeks' gestation. Pattern recognition and cluster analysis were used to select the parameters included in the prediction models. Logistic regression, decision tree model, naïve Bayes classification, support vector machine, random forest algorithm, and stochastic gradient boosting method were used to construct the prediction models. C-statistics was used to assess the performance of each model. The overall preeclampsia development rate was 4.7% (474 patients). Systolic blood pressure, serum blood urea nitrogen and creatinine levels, platelet counts, serum potassium level, white blood cell count, serum calcium level, and urinary protein were the most influential variables included in the prediction models. C-statistics for the decision tree model, naïve Bayes classification, support vector machine, random forest algorithm, stochastic gradient boosting method, and logistic regression models were 0.857, 0.776, 0.573, 0.894, 0.924, and 0.806, respectively. The stochastic gradient boosting model had the best prediction performance with an accuracy and false positive rate of 0.973 and 0.009, respectively. The combined use of maternal factors and common antenatal laboratory data of the early second trimester through early third trimester could effectively predict late-onset preeclampsia using machine learning algorithms. Future prospective studies are needed to verify the clinical applicability algorithms.

Original languageEnglish
Article numbere0221202
JournalPloS one
Volume14
Issue number8
DOIs
Publication statusPublished - 2019 Jan 1

Fingerprint

pre-eclampsia
artificial intelligence
Pre-Eclampsia
Learning systems
prediction
Decision Trees
Electronic Health Records
Logistic Models
Mothers
Second Pregnancy Trimester
Serum
blood serum
Blood Pressure
Fetal Mortality
methodology
Electronic medical equipment
Prenatal Care
Blood Urea Nitrogen
Third Pregnancy Trimester
Platelet Count

All Science Journal Classification (ASJC) codes

  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)

Cite this

Jhee, Jong Hyun ; Lee, Sunghee ; Park, Yejin ; Lee, Sang Eun ; Kim, Young Ah ; Kang, Shin-Wook ; Kwon, Ja Young ; Park, Jung Tak. / Prediction model development of late-onset preeclampsia using machine learning-based methods. In: PloS one. 2019 ; Vol. 14, No. 8.
@article{0f8502d1ef694bf0bc497600c66d11d5,
title = "Prediction model development of late-onset preeclampsia using machine learning-based methods",
abstract = "Preeclampsia is one of the leading causes of maternal and fetal morbidity and mortality. Due to the lack of effective preventive measures, its prediction is essential to its prompt management. This study aimed to develop models using machine learning to predict late-onset preeclampsia using hospital electronic medical record data. The performance of the machine learning based models and models using conventional statistical methods were also compared. A total of 11,006 pregnant women who received antenatal care at Yonsei University Hospital were included. Maternal data were retrieved from electronic medical records during the early second trimester to 34 weeks. The prediction outcome was late-onset preeclampsia occurrence after 34 weeks' gestation. Pattern recognition and cluster analysis were used to select the parameters included in the prediction models. Logistic regression, decision tree model, na{\"i}ve Bayes classification, support vector machine, random forest algorithm, and stochastic gradient boosting method were used to construct the prediction models. C-statistics was used to assess the performance of each model. The overall preeclampsia development rate was 4.7{\%} (474 patients). Systolic blood pressure, serum blood urea nitrogen and creatinine levels, platelet counts, serum potassium level, white blood cell count, serum calcium level, and urinary protein were the most influential variables included in the prediction models. C-statistics for the decision tree model, na{\"i}ve Bayes classification, support vector machine, random forest algorithm, stochastic gradient boosting method, and logistic regression models were 0.857, 0.776, 0.573, 0.894, 0.924, and 0.806, respectively. The stochastic gradient boosting model had the best prediction performance with an accuracy and false positive rate of 0.973 and 0.009, respectively. The combined use of maternal factors and common antenatal laboratory data of the early second trimester through early third trimester could effectively predict late-onset preeclampsia using machine learning algorithms. Future prospective studies are needed to verify the clinical applicability algorithms.",
author = "Jhee, {Jong Hyun} and Sunghee Lee and Yejin Park and Lee, {Sang Eun} and Kim, {Young Ah} and Shin-Wook Kang and Kwon, {Ja Young} and Park, {Jung Tak}",
year = "2019",
month = "1",
day = "1",
doi = "10.1371/journal.pone.0221202",
language = "English",
volume = "14",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "8",

}

Prediction model development of late-onset preeclampsia using machine learning-based methods. / Jhee, Jong Hyun; Lee, Sunghee; Park, Yejin; Lee, Sang Eun; Kim, Young Ah; Kang, Shin-Wook; Kwon, Ja Young; Park, Jung Tak.

In: PloS one, Vol. 14, No. 8, e0221202, 01.01.2019.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Prediction model development of late-onset preeclampsia using machine learning-based methods

AU - Jhee, Jong Hyun

AU - Lee, Sunghee

AU - Park, Yejin

AU - Lee, Sang Eun

AU - Kim, Young Ah

AU - Kang, Shin-Wook

AU - Kwon, Ja Young

AU - Park, Jung Tak

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Preeclampsia is one of the leading causes of maternal and fetal morbidity and mortality. Due to the lack of effective preventive measures, its prediction is essential to its prompt management. This study aimed to develop models using machine learning to predict late-onset preeclampsia using hospital electronic medical record data. The performance of the machine learning based models and models using conventional statistical methods were also compared. A total of 11,006 pregnant women who received antenatal care at Yonsei University Hospital were included. Maternal data were retrieved from electronic medical records during the early second trimester to 34 weeks. The prediction outcome was late-onset preeclampsia occurrence after 34 weeks' gestation. Pattern recognition and cluster analysis were used to select the parameters included in the prediction models. Logistic regression, decision tree model, naïve Bayes classification, support vector machine, random forest algorithm, and stochastic gradient boosting method were used to construct the prediction models. C-statistics was used to assess the performance of each model. The overall preeclampsia development rate was 4.7% (474 patients). Systolic blood pressure, serum blood urea nitrogen and creatinine levels, platelet counts, serum potassium level, white blood cell count, serum calcium level, and urinary protein were the most influential variables included in the prediction models. C-statistics for the decision tree model, naïve Bayes classification, support vector machine, random forest algorithm, stochastic gradient boosting method, and logistic regression models were 0.857, 0.776, 0.573, 0.894, 0.924, and 0.806, respectively. The stochastic gradient boosting model had the best prediction performance with an accuracy and false positive rate of 0.973 and 0.009, respectively. The combined use of maternal factors and common antenatal laboratory data of the early second trimester through early third trimester could effectively predict late-onset preeclampsia using machine learning algorithms. Future prospective studies are needed to verify the clinical applicability algorithms.

AB - Preeclampsia is one of the leading causes of maternal and fetal morbidity and mortality. Due to the lack of effective preventive measures, its prediction is essential to its prompt management. This study aimed to develop models using machine learning to predict late-onset preeclampsia using hospital electronic medical record data. The performance of the machine learning based models and models using conventional statistical methods were also compared. A total of 11,006 pregnant women who received antenatal care at Yonsei University Hospital were included. Maternal data were retrieved from electronic medical records during the early second trimester to 34 weeks. The prediction outcome was late-onset preeclampsia occurrence after 34 weeks' gestation. Pattern recognition and cluster analysis were used to select the parameters included in the prediction models. Logistic regression, decision tree model, naïve Bayes classification, support vector machine, random forest algorithm, and stochastic gradient boosting method were used to construct the prediction models. C-statistics was used to assess the performance of each model. The overall preeclampsia development rate was 4.7% (474 patients). Systolic blood pressure, serum blood urea nitrogen and creatinine levels, platelet counts, serum potassium level, white blood cell count, serum calcium level, and urinary protein were the most influential variables included in the prediction models. C-statistics for the decision tree model, naïve Bayes classification, support vector machine, random forest algorithm, stochastic gradient boosting method, and logistic regression models were 0.857, 0.776, 0.573, 0.894, 0.924, and 0.806, respectively. The stochastic gradient boosting model had the best prediction performance with an accuracy and false positive rate of 0.973 and 0.009, respectively. The combined use of maternal factors and common antenatal laboratory data of the early second trimester through early third trimester could effectively predict late-onset preeclampsia using machine learning algorithms. Future prospective studies are needed to verify the clinical applicability algorithms.

UR - http://www.scopus.com/inward/record.url?scp=85071308561&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85071308561&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0221202

DO - 10.1371/journal.pone.0221202

M3 - Article

VL - 14

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 8

M1 - e0221202

ER -