KXtractor: An effective biomedical information extraction technique based on mixture hidden markov models

Min Song, Il Yeol Song, Xiaohua Hu, Robert B. Allen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

We present a novel information extraction (IE) technique, KXtractor, which combines a text chunking technique and Mixture Hidden Markov Models (MiHMM). KXtractor overcomes the problem of the single Part-Of-Speech (POS) HMMs with modeling the rich representation of text where features overlap among state units such as word, line, sentence, and paragraph. KXtractor also resolves issues with the traditional HMMs for IE that operate only on the semi-structured data such as HTML documents and other text sources in which language grammar does not play a pivotal role. We compared KXtractor with three IE techniques: 1) RAPIER, an inductive learning-based machine learning system, 2) a Dictionary-based extraction system, and 3) single POS HMM. Our experiments showed that KXtractor outperforms these three IE systems in extracting protein-protein interactions. In our experiments, the F-measure for KXtractor was higher than for RAPIER, a dictionary-based system, and single POS HMM respectively by 16.89%, 16.28%, and 8.58%. In addition, both precision and recall of KXtractor are higher than those systems.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages68-81
Number of pages14
DOIs
Publication statusPublished - 2005 Dec 1
EventInternational Workshop on Bioinformatics Research and Applications, IWBRA 2005 - Atlanta, GA, United States
Duration: 2005 May 222005 May 24

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3680 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

OtherInternational Workshop on Bioinformatics Research and Applications, IWBRA 2005
CountryUnited States
CityAtlanta, GA
Period05/5/2205/5/24

Fingerprint

Information Extraction
Hidden Markov models
Markov Model
Glossaries
Learning systems
Inductive Learning
Semistructured Data
HTML
Protein-protein Interaction
Proteins
Learning Systems
Grammar
Experiment
Overlap
Resolve
Machine Learning
Experiments
Unit
Line
Modeling

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Song, M., Song, I. Y., Hu, X., & Allen, R. B. (2005). KXtractor: An effective biomedical information extraction technique based on mixture hidden markov models. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (pp. 68-81). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3680 LNBI). https://doi.org/10.1007/11567752_5
Song, Min ; Song, Il Yeol ; Hu, Xiaohua ; Allen, Robert B. / KXtractor : An effective biomedical information extraction technique based on mixture hidden markov models. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2005. pp. 68-81 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{b8076b6952f4483d8e9e08a98d393bbb,
title = "KXtractor: An effective biomedical information extraction technique based on mixture hidden markov models",
abstract = "We present a novel information extraction (IE) technique, KXtractor, which combines a text chunking technique and Mixture Hidden Markov Models (MiHMM). KXtractor overcomes the problem of the single Part-Of-Speech (POS) HMMs with modeling the rich representation of text where features overlap among state units such as word, line, sentence, and paragraph. KXtractor also resolves issues with the traditional HMMs for IE that operate only on the semi-structured data such as HTML documents and other text sources in which language grammar does not play a pivotal role. We compared KXtractor with three IE techniques: 1) RAPIER, an inductive learning-based machine learning system, 2) a Dictionary-based extraction system, and 3) single POS HMM. Our experiments showed that KXtractor outperforms these three IE systems in extracting protein-protein interactions. In our experiments, the F-measure for KXtractor was higher than for RAPIER, a dictionary-based system, and single POS HMM respectively by 16.89{\%}, 16.28{\%}, and 8.58{\%}. In addition, both precision and recall of KXtractor are higher than those systems.",
author = "Min Song and Song, {Il Yeol} and Xiaohua Hu and Allen, {Robert B.}",
year = "2005",
month = "12",
day = "1",
doi = "10.1007/11567752_5",
language = "English",
isbn = "3540294015",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "68--81",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

Song, M, Song, IY, Hu, X & Allen, RB 2005, KXtractor: An effective biomedical information extraction technique based on mixture hidden markov models. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 3680 LNBI, pp. 68-81, International Workshop on Bioinformatics Research and Applications, IWBRA 2005, Atlanta, GA, United States, 05/5/22. https://doi.org/10.1007/11567752_5

KXtractor : An effective biomedical information extraction technique based on mixture hidden markov models. / Song, Min; Song, Il Yeol; Hu, Xiaohua; Allen, Robert B.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2005. p. 68-81 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3680 LNBI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - KXtractor

T2 - An effective biomedical information extraction technique based on mixture hidden markov models

AU - Song, Min

AU - Song, Il Yeol

AU - Hu, Xiaohua

AU - Allen, Robert B.

PY - 2005/12/1

Y1 - 2005/12/1

N2 - We present a novel information extraction (IE) technique, KXtractor, which combines a text chunking technique and Mixture Hidden Markov Models (MiHMM). KXtractor overcomes the problem of the single Part-Of-Speech (POS) HMMs with modeling the rich representation of text where features overlap among state units such as word, line, sentence, and paragraph. KXtractor also resolves issues with the traditional HMMs for IE that operate only on the semi-structured data such as HTML documents and other text sources in which language grammar does not play a pivotal role. We compared KXtractor with three IE techniques: 1) RAPIER, an inductive learning-based machine learning system, 2) a Dictionary-based extraction system, and 3) single POS HMM. Our experiments showed that KXtractor outperforms these three IE systems in extracting protein-protein interactions. In our experiments, the F-measure for KXtractor was higher than for RAPIER, a dictionary-based system, and single POS HMM respectively by 16.89%, 16.28%, and 8.58%. In addition, both precision and recall of KXtractor are higher than those systems.

AB - We present a novel information extraction (IE) technique, KXtractor, which combines a text chunking technique and Mixture Hidden Markov Models (MiHMM). KXtractor overcomes the problem of the single Part-Of-Speech (POS) HMMs with modeling the rich representation of text where features overlap among state units such as word, line, sentence, and paragraph. KXtractor also resolves issues with the traditional HMMs for IE that operate only on the semi-structured data such as HTML documents and other text sources in which language grammar does not play a pivotal role. We compared KXtractor with three IE techniques: 1) RAPIER, an inductive learning-based machine learning system, 2) a Dictionary-based extraction system, and 3) single POS HMM. Our experiments showed that KXtractor outperforms these three IE systems in extracting protein-protein interactions. In our experiments, the F-measure for KXtractor was higher than for RAPIER, a dictionary-based system, and single POS HMM respectively by 16.89%, 16.28%, and 8.58%. In addition, both precision and recall of KXtractor are higher than those systems.

UR - http://www.scopus.com/inward/record.url?scp=33646187984&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33646187984&partnerID=8YFLogxK

U2 - 10.1007/11567752_5

DO - 10.1007/11567752_5

M3 - Conference contribution

AN - SCOPUS:33646187984

SN - 3540294015

SN - 9783540294016

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 68

EP - 81

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -

Song M, Song IY, Hu X, Allen RB. KXtractor: An effective biomedical information extraction technique based on mixture hidden markov models. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2005. p. 68-81. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/11567752_5