The purpose of this study was to evaluate and compare the diagnostic performances of the deep convolutional neural network (CNN) and expert radiologists for differentiating thyroid nodules on ultrasonography (US), and to validate the results in multicenter data sets. This multicenter retrospective study collected 15,375 US images of thyroid nodules for algorithm development (n = 13,560, Severance Hospital, SH training set), the internal test (n = 634, SH test set), and the external test (n = 781, Samsung Medical Center, SMC set; n = 200, CHA Bundang Medical Center, CBMC set; n = 200, Kyung Hee University Hospital, KUH set). Two individual CNNs and two classification ensembles (CNNE1 and CNNE2) were tested to differentiate malignant and benign thyroid nodules. CNNs demonstrated high area under the curves (AUCs) to diagnose malignant thyroid nodules (0.898–0.937 for the internal test set and 0.821–0.885 for the external test sets). AUC was significantly higher for CNNE2 than radiologists in the SH test set (0.932 vs. 0.840, P < 0.001). AUC was not significantly different between CNNE2 and radiologists in the external test sets (P = 0.113, 0.126, and 0.690). CNN showed diagnostic performances comparable to expert radiologists for differentiating thyroid nodules on US in both the internal and external test sets.
Bibliographical noteFunding Information:
This study was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) by the Ministry of Education (2016R1D1A1B03930375) and by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (2019R1A2C1002375). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
© 2020, The Author(s).
All Science Journal Classification (ASJC) codes