A practical method for approximate subsequence search in DNA databases

Jung Im Won, Sang Kyoon Hong, Jee Hee Yoon, Sang Hyun Park, Sang Wook Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

In this paper, we propose an accurate and efficient method for approximate subsequence search in large DNA databases. The proposed method basically adopts a binary trie as its primary structure and stores all the window subsequences extracted from a DNA sequence. For approximate subsequence search, it traverses the binary trie in a breadth-first fashion and retrieves all the matched subsequences from the traversed path within the trie by a dynamic programming technique. However, the proposed method stores only window subsequences of the pre-determined length, and thus suffers from large post-processing time in case of long query sequences. To overcome this problem, we divide a query sequence into shorter pieces, perform searching for those subsequences, and then merge their results.

Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining - 11th Pacific-Asia Conference, PAKDD 2007, Proceedings
Pages921-931
Number of pages11
Publication statusPublished - 2007 Dec 1
Event11th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2007 - Nanjing, China
Duration: 2007 May 222007 May 25

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4426 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other11th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2007
CountryChina
CityNanjing
Period07/5/2207/5/25

Fingerprint

DNA sequences
Subsequence
Dynamic programming
DNA
Processing
Query
Binary
Breadth
Post-processing
DNA Sequence
Dynamic Programming
Divides
Path

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Won, J. I., Hong, S. K., Yoon, J. H., Park, S. H., & Kim, S. W. (2007). A practical method for approximate subsequence search in DNA databases. In Advances in Knowledge Discovery and Data Mining - 11th Pacific-Asia Conference, PAKDD 2007, Proceedings (pp. 921-931). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4426 LNAI).
Won, Jung Im ; Hong, Sang Kyoon ; Yoon, Jee Hee ; Park, Sang Hyun ; Kim, Sang Wook. / A practical method for approximate subsequence search in DNA databases. Advances in Knowledge Discovery and Data Mining - 11th Pacific-Asia Conference, PAKDD 2007, Proceedings. 2007. pp. 921-931 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{9cdf24993dc0480ebb39c36ccb8837d2,
title = "A practical method for approximate subsequence search in DNA databases",
abstract = "In this paper, we propose an accurate and efficient method for approximate subsequence search in large DNA databases. The proposed method basically adopts a binary trie as its primary structure and stores all the window subsequences extracted from a DNA sequence. For approximate subsequence search, it traverses the binary trie in a breadth-first fashion and retrieves all the matched subsequences from the traversed path within the trie by a dynamic programming technique. However, the proposed method stores only window subsequences of the pre-determined length, and thus suffers from large post-processing time in case of long query sequences. To overcome this problem, we divide a query sequence into shorter pieces, perform searching for those subsequences, and then merge their results.",
author = "Won, {Jung Im} and Hong, {Sang Kyoon} and Yoon, {Jee Hee} and Park, {Sang Hyun} and Kim, {Sang Wook}",
year = "2007",
month = "12",
day = "1",
language = "English",
isbn = "9783540717003",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "921--931",
booktitle = "Advances in Knowledge Discovery and Data Mining - 11th Pacific-Asia Conference, PAKDD 2007, Proceedings",

}

Won, JI, Hong, SK, Yoon, JH, Park, SH & Kim, SW 2007, A practical method for approximate subsequence search in DNA databases. in Advances in Knowledge Discovery and Data Mining - 11th Pacific-Asia Conference, PAKDD 2007, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4426 LNAI, pp. 921-931, 11th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2007, Nanjing, China, 07/5/22.

A practical method for approximate subsequence search in DNA databases. / Won, Jung Im; Hong, Sang Kyoon; Yoon, Jee Hee; Park, Sang Hyun; Kim, Sang Wook.

Advances in Knowledge Discovery and Data Mining - 11th Pacific-Asia Conference, PAKDD 2007, Proceedings. 2007. p. 921-931 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4426 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - A practical method for approximate subsequence search in DNA databases

AU - Won, Jung Im

AU - Hong, Sang Kyoon

AU - Yoon, Jee Hee

AU - Park, Sang Hyun

AU - Kim, Sang Wook

PY - 2007/12/1

Y1 - 2007/12/1

N2 - In this paper, we propose an accurate and efficient method for approximate subsequence search in large DNA databases. The proposed method basically adopts a binary trie as its primary structure and stores all the window subsequences extracted from a DNA sequence. For approximate subsequence search, it traverses the binary trie in a breadth-first fashion and retrieves all the matched subsequences from the traversed path within the trie by a dynamic programming technique. However, the proposed method stores only window subsequences of the pre-determined length, and thus suffers from large post-processing time in case of long query sequences. To overcome this problem, we divide a query sequence into shorter pieces, perform searching for those subsequences, and then merge their results.

AB - In this paper, we propose an accurate and efficient method for approximate subsequence search in large DNA databases. The proposed method basically adopts a binary trie as its primary structure and stores all the window subsequences extracted from a DNA sequence. For approximate subsequence search, it traverses the binary trie in a breadth-first fashion and retrieves all the matched subsequences from the traversed path within the trie by a dynamic programming technique. However, the proposed method stores only window subsequences of the pre-determined length, and thus suffers from large post-processing time in case of long query sequences. To overcome this problem, we divide a query sequence into shorter pieces, perform searching for those subsequences, and then merge their results.

UR - http://www.scopus.com/inward/record.url?scp=38049171347&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=38049171347&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:38049171347

SN - 9783540717003

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 921

EP - 931

BT - Advances in Knowledge Discovery and Data Mining - 11th Pacific-Asia Conference, PAKDD 2007, Proceedings

ER -

Won JI, Hong SK, Yoon JH, Park SH, Kim SW. A practical method for approximate subsequence search in DNA databases. In Advances in Knowledge Discovery and Data Mining - 11th Pacific-Asia Conference, PAKDD 2007, Proceedings. 2007. p. 921-931. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).