Automated extraction of information of lung cancer staging from unstructured reports of PET-CT interpretation: natural language processing with deep-learning

Hyung Jun Park, Namu Park, Jang Ho Lee, Myeong Geun Choi, Jin Sook Ryu, Min Song, Chang Min Choi

Research output: Contribution to journalArticlepeer-review

Abstract

Background: Extracting metastatic information from previous radiologic-text reports is important, however, laborious annotations have limited the usability of these texts. We developed a deep-learning model for extracting primary lung cancer sites and metastatic lymph nodes and distant metastasis information from PET-CT reports for determining lung cancer stages. Methods: PET-CT reports, fully written in English, were acquired from two cohorts of patients with lung cancer who were diagnosed at a tertiary hospital between January 2004 and March 2020. One cohort of 20,466 PET-CT reports was used for training and the validation set, and the other cohort of 4190 PET-CT reports was used for an additional-test set. A pre-processing model (Lung Cancer Spell Checker) was applied to correct the typographical errors, and pseudo-labelling was used for training the model. The deep-learning model was constructed using the Convolutional-Recurrent Neural Network. The performance metrics for the prediction model were accuracy, precision, sensitivity, micro-AUROC, and AUPRC. Results: For the extraction of primary lung cancer location, the model showed a micro-AUROC of 0.913 and 0.946 in the validation set and the additional-test set, respectively. For metastatic lymph nodes, the model showed a sensitivity of 0.827 and a specificity of 0.960. In predicting distant metastasis, the model showed a micro-AUROC of 0.944 and 0.950 in the validation and the additional-test set, respectively. Conclusion: Our deep-learning method could be used for extracting lung cancer stage information from PET-CT reports and may facilitate lung cancer studies by alleviating laborious annotation by clinicians.

Original languageEnglish
Article number229
JournalBMC Medical Informatics and Decision Making
Volume22
Issue number1
DOIs
Publication statusPublished - 2022 Dec

Bibliographical note

Funding Information:
This study was supported by a grant (Elimination of Cancer Project Fund) from Asan Cancer Institute of Asan Medical Center, Seoul, and the Institute of Information and Communications Technology Planning and Evaluation (IITP) grant funded by the Korean government (MSIT) (No. 2020-0-01361, Artificial Intelligence Graduate School Program (Yonsei University)).

Publisher Copyright:
© 2022, The Author(s).

All Science Journal Classification (ASJC) codes

  • Health Policy
  • Health Informatics
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Automated extraction of information of lung cancer staging from unstructured reports of PET-CT interpretation: natural language processing with deep-learning'. Together they form a unique fingerprint.

Cite this