BioPREP: Deep learning-based predicate classification with SemMedDB

Gibong Hong, Yuheun Kim, Yeon Jung Choi, Min Song

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)


When it comes to inferring relations between entities in biomedical texts, Relation Extraction (RE) has become key to biomedical information extraction. Although previous studies focused on using rule-based and machine learning-based approaches, these methods lacked efficiency in terms of the demanding amount of feature processing while resulting in relatively low accuracy. Some existing biomedical relation extraction tools are based on neural networks. Nonetheless, they rarely analyze possible causes of the difference in accuracy among predicates. Also, there have not been enough biomedical datasets that were structured for predicate classification. With these regards, we set our research goals as follows: constructing a large-scale training dataset, namely Biomedical Predicate Relation-extraction with Entity-filtering by PKDE4J (BioPREP), based on SemMedDB then using PKDE4J as an entity-filtering tool, evaluating the performances of each neural network-based algorithms on the structured dataset. We then analyzed our model's performance in-depth by grouping predicates into semantic clusters. Based on comprehensive experimental outcomes, the experiments showed that the BioBERT-based model outperformed other models for predicate classification. The suggested model achieved an f1-score of 0.846 when BioBERT was loaded as the pre-trained model and 0.840 when SciBERT weights were loaded. Moreover, the semantic cluster analysis showed that sentences containing key phrases were classified better, such as comparison verb + ‘than’.

Original languageEnglish
Article number103888
JournalJournal of Biomedical Informatics
Publication statusPublished - 2021 Oct

Bibliographical note

Funding Information:
This work was supported by the Bio-Synergy Research Project (NRF-2013M3A9C4078138) of the Ministry of Science, ICT, and Future Planning through the National Research Foundation.

Publisher Copyright:
© 2021 Elsevier Inc.

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Health Informatics


Dive into the research topics of 'BioPREP: Deep learning-based predicate classification with SemMedDB'. Together they form a unique fingerprint.

Cite this