An automatic unsupervised querying algorithm for efficient information extraction in biomedical domain

Min Song, Il Yeol Song, Xiaohua Hu, Robert B. Allen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In the domain of bioinformatics, extracting a relation such as protein-protein iriterations from a large database of text documents is a challenging task. One major issue with biomedical information extraction is how to efficiently digest the sheersize of unstructured biomedical data corpus. Often, among these huge biomedical data, only a small fraction of the documents contain information that is relevant to the extraction task. We propose a novel query expansion algorithm to automatically discover the characteristics of documents that are useful for extraction of a target relation. Our technique introduces a hybrid query re-weighting algorithm combining the modified Robertson Sparck-Jones query ranking algorithm with a keyphrase extraction algorithm. Our technique also adopts a novel query translation technique that incorporates POS categories to query translation. We conduct a series of experiments and report the experimental results. The results show that our technique is able to retrieve more documents that contain protein-protein pairs from MEDLINE as iteration increases. Our technique is also compared with SLIPPER, a supervised rule-based query expansion technique. The results show that our technique outperforms SLIPPER from 17.90% to 29.98 better in four iterations.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages173-179
Number of pages7
Publication statusPublished - 2005 Dec 1
Event9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2005 - Hanoi, Viet Nam
Duration: 2005 May 182005 May 20

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3518 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2005
CountryViet Nam
CityHanoi
Period05/5/1805/5/20

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Song, M., Song, I. Y., Hu, X., & Allen, R. B. (2005). An automatic unsupervised querying algorithm for efficient information extraction in biomedical domain. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (pp. 173-179). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3518 LNAI).