Towards efficient searching on the secondary structure of protein sequences

Minkoo Seo, Sanghyun Park, Jung Im Won

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Approximate searching on the primary structure (i.e., amino acid arrangement) of protein sequences is an essential part in predicting the functions and evolutionary histories of proteins. However, because proteins distant in an evolutionary history do not conserve amino acid residue arrangements, approximate searching on proteins' secondary structure is quite important in finding out distant homology. In this paper, we propose an indexing scheme for efficient approximate searching on the secondary structure of protein sequences which can be easily implemented in RDBMS. Exploiting the concept of clustering and lookahead, the proposed indexing scheme processes three types of secondary structure queries (i.e., exact match, range match, and wildcard match) very quickly. To evaluate the performance of the proposed method, we conducted extensive experiments using a set of actual protein sequences. According to the experimental results, the proposed method was proved to be faster than the existing indexing methods up to 6.3 times in exact match, 3.3 times in range match, and 1.5 times in wildcard match, respectively.

Original languageEnglish
Pages (from-to)525-542
Number of pages18
JournalFundamenta Informaticae
Volume78
Issue number4
Publication statusPublished - 2007 Sep 18

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Algebra and Number Theory
  • Information Systems
  • Computational Theory and Mathematics

Fingerprint Dive into the research topics of 'Towards efficient searching on the secondary structure of protein sequences'. Together they form a unique fingerprint.

  • Cite this