A novel indexing method for efficient sequence matching in large DNA database environment

Jung Im Won, Jee Hee Yoon, Sanghyun Park, Sang Wook Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

In molecular biology, DNA sequence matching is one of the most crucial operations. Since DNA databases contain a huge volume of sequences, fast indexes are essential for efficient processing of DNA sequence matching. In this paper, we first point out the problems of the suffix tree, an index structure widely-used for DNA sequence matching, in the respects of the storage overhead, search performance, and difficulty in seamless integration with DBMS. Then, we propose a new index structure that resolves such problems. The proposed index structure consists of the two parts: the primary part realizes the trie as binary bit-string representation without any pointers, and the secondary part helps fast accesses of leaf nodes of the trie that need to be accessed for post-processing. We also suggest efficient algorithms based on that index for DNA sequence matching. To verify the superiority of the proposed approach, we conduct performance evaluation via a series of experiments. The results reveal that the proposed approach, which requires smaller storage space, can be a few orders of magnitude faster than the suffix tree.

Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining - 9th Pacific-Asia Conference, PAKDD 2005, Proceedings
PublisherSpringer Verlag
Pages203-215
Number of pages13
ISBN (Print)3540260765, 9783540260769
DOIs
Publication statusPublished - 2005 Jan 1
Event9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2005 - Hanoi, Viet Nam
Duration: 2005 May 182005 May 20

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3518 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2005
CountryViet Nam
CityHanoi
Period05/5/1805/5/20

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'A novel indexing method for efficient sequence matching in large DNA database environment'. Together they form a unique fingerprint.

  • Cite this

    Won, J. I., Yoon, J. H., Park, S., & Kim, S. W. (2005). A novel indexing method for efficient sequence matching in large DNA database environment. In Advances in Knowledge Discovery and Data Mining - 9th Pacific-Asia Conference, PAKDD 2005, Proceedings (pp. 203-215). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3518 LNAI). Springer Verlag. https://doi.org/10.1007/11430919_26