An automatic unsupervised querying algorithm for efficient information extraction in biomedical domain

Min Song, Il Yeol Song, Xiaohua Hu, Robert B. Allen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In the domain of bioinformatics, extracting a relation such as protein-protein iriterations from a large database of text documents is a challenging task. One major issue with biomedical information extraction is how to efficiently digest the sheersize of unstructured biomedical data corpus. Often, among these huge biomedical data, only a small fraction of the documents contain information that is relevant to the extraction task. We propose a novel query expansion algorithm to automatically discover the characteristics of documents that are useful for extraction of a target relation. Our technique introduces a hybrid query re-weighting algorithm combining the modified Robertson Sparck-Jones query ranking algorithm with a keyphrase extraction algorithm. Our technique also adopts a novel query translation technique that incorporates POS categories to query translation. We conduct a series of experiments and report the experimental results. The results show that our technique is able to retrieve more documents that contain protein-protein pairs from MEDLINE as iteration increases. Our technique is also compared with SLIPPER, a supervised rule-based query expansion technique. The results show that our technique outperforms SLIPPER from 17.90% to 29.98 better in four iterations.

Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining - 9th Pacific-Asia Conference, PAKDD 2005, Proceedings
PublisherSpringer Verlag
Pages173-179
Number of pages7
ISBN (Print)3540260765, 9783540260769
DOIs
Publication statusPublished - 2005
Event9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2005 - Hanoi, Viet Nam
Duration: 2005 May 182005 May 20

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3518 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2005
CountryViet Nam
CityHanoi
Period05/5/1805/5/20

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'An automatic unsupervised querying algorithm for efficient information extraction in biomedical domain'. Together they form a unique fingerprint.

Cite this