A DNA index structure using frequency and position information of genetic alphabet

Woo Cheol Kim, Sanghyun Park, Jung Im Won, Sang Wook Kim, Jee Hee Yoon

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Exact match queries, wildcard match queries, and k-mismatch queries are widely used in lots of molecular biology applications including the searching of ESTs (Expressed Sequence Tag) and DNA transcription factors. In this paper, we suggest an efficient indexing and processing mechanism for such queries. Our indexing method places a sliding window at every possible location of a DNA sequence and extracts its signature by considering the occurrence frequency of each nucleotide. It then stores a set of signatures using a multi-dimensional index, such as the R*-tree. Also, by assigning a weight to each position of a window, it prevents signatures from being concentrated around a few spots in indexing space. Our query processing method converts a query sequence into a multi-dimensional rectangle and searches the index for the signatures overlapped with the rectangle.

Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining - 9th Pacific-Asia Conference, PAKDD 2005, Proceedings
PublisherSpringer Verlag
Pages162-172
Number of pages11
ISBN (Print)3540260765, 9783540260769
DOIs
Publication statusPublished - 2005
Event9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2005 - Hanoi, Viet Nam
Duration: 2005 May 182005 May 20

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3518 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2005
CountryViet Nam
CityHanoi
Period05/5/1805/5/20

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'A DNA index structure using frequency and position information of genetic alphabet'. Together they form a unique fingerprint.

  • Cite this

    Kim, W. C., Park, S., Won, J. I., Kim, S. W., & Yoon, J. H. (2005). A DNA index structure using frequency and position information of genetic alphabet. In Advances in Knowledge Discovery and Data Mining - 9th Pacific-Asia Conference, PAKDD 2005, Proceedings (pp. 162-172). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3518 LNAI). Springer Verlag. https://doi.org/10.1007/11430919_21