Generating sequential electronic health records using dual adversarial autoencoder

Dongha Lee, Hwanjo Yu, Xiaoqian Jiang, Deevakar Rogith, Meghana Gudala, Mubeen Tejani, Qiuchen Zhang, Li Xiong

Research output: Contribution to journalArticlepeer-review


Objective: Recent studies on electronic health records (EHRs) started to learn deep generative models and synthesize a huge amount of realistic records, in order to address significant privacy issues surrounding the EHR. However, most of them only focus on structured records about patients’ independent visits, rather than on chronological clinical records. In this article, we aim to learn and synthesize realistic sequences of EHRs based on the generative autoencoder. Materials and Methods: We propose a dual adversarial autoencoder (DAAE), which learns set-valued sequences of medical entities, by combining a recurrent autoencoder with 2 generative adversarial networks (GANs). DAAE improves the mode coverage and quality of generated sequences by adversarially learning both the continuous latent distribution and the discrete data distribution. Using the MIMIC-III (Medical Information Mart for Intensive Care-III) and UT Physicians clinical databases, we evaluated the performances of DAAE in terms of predictive modeling, plausibility, and privacy preservation. Results: Our generated sequences of EHRs showed the comparable performances to real data for a predictive modeling task, and achieved the best score in plausibility evaluation conducted by medical experts among all baseline models. In addition, differentially private optimization of our model enables to generate synthetic sequences without increasing the privacy leakage of patients’ data. Conclusions: DAAE can effectively synthesize sequential EHRs by addressing its main challenges: the synthetic records should be realistic enough not to be distinguished from the real records, and they should cover all the training patients to reproduce the performance of specific downstream tasks.

Original languageEnglish
Pages (from-to)1411-1419
Number of pages9
JournalJournal of the American Medical Informatics Association : JAMIA
Issue number9
Publication statusPublished - 2020 Sept 1

Bibliographical note

Funding Information:
This research was supported by Institute for Information & communications Technology Protmotion (IITP) grant funded by the Korea government (no. 2018-0-00584). XJ is the Cancer Prevention and Research Institute of Texas Scholar in Cancer Research (RR180012), and was supported in part by a Christopher Sarofim Family Professorship, UT Stars award, UTHealth startup, the National Institutes of Health under award numbers R01GM114612, R01GM118574, and U01TR002062.

Publisher Copyright:
© The Author(s) 2020.

All Science Journal Classification (ASJC) codes

  • Health Informatics


Dive into the research topics of 'Generating sequential electronic health records using dual adversarial autoencoder'. Together they form a unique fingerprint.

Cite this