PKDE4J

Entity and relation extraction for public knowledge discovery

Min Song, Won Chul Kim, Dahee Lee, Go Eun Heo, Keun Young Kang

Research output: Contribution to journalArticle

37 Citations (Scopus)

Abstract

Due to an enormous number of scientific publications that cannot be handled manually, there is a rising interest in text-mining techniques for automated information extraction, especially in the biomedical field. Such techniques provide effective means of information search, knowledge discovery, and hypothesis generation. Most previous studies have primarily focused on the design and performance improvement of either named entity recognition or relation extraction. In this paper, we present PKDE4J, a comprehensive text-mining system that integrates dictionary-based entity extraction and rule-based relation extraction in a highly flexible and extensible framework. Starting with the Stanford CoreNLP, we developed the system to cope with multiple types of entities and relations. The system also has fairly good performance in terms of accuracy as well as the ability to configure text-processing components. We demonstrate its competitive performance by evaluating it on many corpora and found that it surpasses existing systems with average F-measures of 85% for entity extraction and 81% for relation extraction.

Original languageEnglish
Pages (from-to)320-332
Number of pages13
JournalJournal of Biomedical Informatics
Volume57
DOIs
Publication statusPublished - 2015 Oct 1

Fingerprint

Public Relations
Data Mining
Data mining
Information Storage and Retrieval
Publications
Text processing
Glossaries

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Health Informatics

Cite this

Song, Min ; Kim, Won Chul ; Lee, Dahee ; Heo, Go Eun ; Kang, Keun Young. / PKDE4J : Entity and relation extraction for public knowledge discovery. In: Journal of Biomedical Informatics. 2015 ; Vol. 57. pp. 320-332.
@article{f22a09e83b62444da63fa757736965e0,
title = "PKDE4J: Entity and relation extraction for public knowledge discovery",
abstract = "Due to an enormous number of scientific publications that cannot be handled manually, there is a rising interest in text-mining techniques for automated information extraction, especially in the biomedical field. Such techniques provide effective means of information search, knowledge discovery, and hypothesis generation. Most previous studies have primarily focused on the design and performance improvement of either named entity recognition or relation extraction. In this paper, we present PKDE4J, a comprehensive text-mining system that integrates dictionary-based entity extraction and rule-based relation extraction in a highly flexible and extensible framework. Starting with the Stanford CoreNLP, we developed the system to cope with multiple types of entities and relations. The system also has fairly good performance in terms of accuracy as well as the ability to configure text-processing components. We demonstrate its competitive performance by evaluating it on many corpora and found that it surpasses existing systems with average F-measures of 85{\%} for entity extraction and 81{\%} for relation extraction.",
author = "Min Song and Kim, {Won Chul} and Dahee Lee and Heo, {Go Eun} and Kang, {Keun Young}",
year = "2015",
month = "10",
day = "1",
doi = "10.1016/j.jbi.2015.08.008",
language = "English",
volume = "57",
pages = "320--332",
journal = "Journal of Biomedical Informatics",
issn = "1532-0464",
publisher = "Academic Press Inc.",

}

PKDE4J : Entity and relation extraction for public knowledge discovery. / Song, Min; Kim, Won Chul; Lee, Dahee; Heo, Go Eun; Kang, Keun Young.

In: Journal of Biomedical Informatics, Vol. 57, 01.10.2015, p. 320-332.

Research output: Contribution to journalArticle

TY - JOUR

T1 - PKDE4J

T2 - Entity and relation extraction for public knowledge discovery

AU - Song, Min

AU - Kim, Won Chul

AU - Lee, Dahee

AU - Heo, Go Eun

AU - Kang, Keun Young

PY - 2015/10/1

Y1 - 2015/10/1

N2 - Due to an enormous number of scientific publications that cannot be handled manually, there is a rising interest in text-mining techniques for automated information extraction, especially in the biomedical field. Such techniques provide effective means of information search, knowledge discovery, and hypothesis generation. Most previous studies have primarily focused on the design and performance improvement of either named entity recognition or relation extraction. In this paper, we present PKDE4J, a comprehensive text-mining system that integrates dictionary-based entity extraction and rule-based relation extraction in a highly flexible and extensible framework. Starting with the Stanford CoreNLP, we developed the system to cope with multiple types of entities and relations. The system also has fairly good performance in terms of accuracy as well as the ability to configure text-processing components. We demonstrate its competitive performance by evaluating it on many corpora and found that it surpasses existing systems with average F-measures of 85% for entity extraction and 81% for relation extraction.

AB - Due to an enormous number of scientific publications that cannot be handled manually, there is a rising interest in text-mining techniques for automated information extraction, especially in the biomedical field. Such techniques provide effective means of information search, knowledge discovery, and hypothesis generation. Most previous studies have primarily focused on the design and performance improvement of either named entity recognition or relation extraction. In this paper, we present PKDE4J, a comprehensive text-mining system that integrates dictionary-based entity extraction and rule-based relation extraction in a highly flexible and extensible framework. Starting with the Stanford CoreNLP, we developed the system to cope with multiple types of entities and relations. The system also has fairly good performance in terms of accuracy as well as the ability to configure text-processing components. We demonstrate its competitive performance by evaluating it on many corpora and found that it surpasses existing systems with average F-measures of 85% for entity extraction and 81% for relation extraction.

UR - http://www.scopus.com/inward/record.url?scp=84949497303&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84949497303&partnerID=8YFLogxK

U2 - 10.1016/j.jbi.2015.08.008

DO - 10.1016/j.jbi.2015.08.008

M3 - Article

VL - 57

SP - 320

EP - 332

JO - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

SN - 1532-0464

ER -