CSI

Clustered segment indexing for efficient approximate searching on the secondary structure of protein sequences

Minkoo Seo, Sanghyun Park, Jung Im Won

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Approximate searching on the primary structure (i.e., amino acid arrangement) of protein sequences is an essential part in predicting the functions and evolutionary histories of proteins. However, because proteins distant in an evolutionary history do not conserve amino acid residue arrangements, approximate searching on proteins' secondary structure is quite important in finding out distant homology. In this paper, we propose an indexing scheme for efficient approximate searching on the secondary structure of protein sequences which can be easily implemented in RDBMS. Exploiting the concept of clustering and lookahead, the proposed indexing scheme processes three types of secondary structure queries (i.e., exact match, range match, and wildcard match) very quickly. To evaluate the performance of the proposed method, we conducted extensive experiments using a set of actual protein sequences. According to the experimental results, the proposed method was proved to be faster than the existing indexing methods up to 6.3 times in exact match, 3.3 times in range match, and 1.5 times in wildcard match, respectively.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages237-247
Number of pages11
Volume3488 LNAI
Publication statusPublished - 2005 Dec 1
Event15th International Symposium on Methodologies for Intelligent Systems, ISMIS 2005 - Saratoga Springs, NY, United States
Duration: 2005 May 252005 May 28

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3488 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other15th International Symposium on Methodologies for Intelligent Systems, ISMIS 2005
CountryUnited States
CitySaratoga Springs, NY
Period05/5/2505/5/28

Fingerprint

Protein Sequence
Secondary Structure
Indexing
Proteins
Amino Acids
Arrangement
Protein
Look-ahead
Conserve
Protein Structure
Range of data
Amino acids
Homology
Clustering
Query
Evaluate
Experimental Results
Experiment
History
Experiments

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Seo, M., Park, S., & Won, J. I. (2005). CSI: Clustered segment indexing for efficient approximate searching on the secondary structure of protein sequences. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3488 LNAI, pp. 237-247). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3488 LNAI).
Seo, Minkoo ; Park, Sanghyun ; Won, Jung Im. / CSI : Clustered segment indexing for efficient approximate searching on the secondary structure of protein sequences. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3488 LNAI 2005. pp. 237-247 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{fb442f32646444f686244f766d8dbb0c,
title = "CSI: Clustered segment indexing for efficient approximate searching on the secondary structure of protein sequences",
abstract = "Approximate searching on the primary structure (i.e., amino acid arrangement) of protein sequences is an essential part in predicting the functions and evolutionary histories of proteins. However, because proteins distant in an evolutionary history do not conserve amino acid residue arrangements, approximate searching on proteins' secondary structure is quite important in finding out distant homology. In this paper, we propose an indexing scheme for efficient approximate searching on the secondary structure of protein sequences which can be easily implemented in RDBMS. Exploiting the concept of clustering and lookahead, the proposed indexing scheme processes three types of secondary structure queries (i.e., exact match, range match, and wildcard match) very quickly. To evaluate the performance of the proposed method, we conducted extensive experiments using a set of actual protein sequences. According to the experimental results, the proposed method was proved to be faster than the existing indexing methods up to 6.3 times in exact match, 3.3 times in range match, and 1.5 times in wildcard match, respectively.",
author = "Minkoo Seo and Sanghyun Park and Won, {Jung Im}",
year = "2005",
month = "12",
day = "1",
language = "English",
isbn = "3540258787",
volume = "3488 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "237--247",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

Seo, M, Park, S & Won, JI 2005, CSI: Clustered segment indexing for efficient approximate searching on the secondary structure of protein sequences. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 3488 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 3488 LNAI, pp. 237-247, 15th International Symposium on Methodologies for Intelligent Systems, ISMIS 2005, Saratoga Springs, NY, United States, 05/5/25.

CSI : Clustered segment indexing for efficient approximate searching on the secondary structure of protein sequences. / Seo, Minkoo; Park, Sanghyun; Won, Jung Im.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3488 LNAI 2005. p. 237-247 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3488 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - CSI

T2 - Clustered segment indexing for efficient approximate searching on the secondary structure of protein sequences

AU - Seo, Minkoo

AU - Park, Sanghyun

AU - Won, Jung Im

PY - 2005/12/1

Y1 - 2005/12/1

N2 - Approximate searching on the primary structure (i.e., amino acid arrangement) of protein sequences is an essential part in predicting the functions and evolutionary histories of proteins. However, because proteins distant in an evolutionary history do not conserve amino acid residue arrangements, approximate searching on proteins' secondary structure is quite important in finding out distant homology. In this paper, we propose an indexing scheme for efficient approximate searching on the secondary structure of protein sequences which can be easily implemented in RDBMS. Exploiting the concept of clustering and lookahead, the proposed indexing scheme processes three types of secondary structure queries (i.e., exact match, range match, and wildcard match) very quickly. To evaluate the performance of the proposed method, we conducted extensive experiments using a set of actual protein sequences. According to the experimental results, the proposed method was proved to be faster than the existing indexing methods up to 6.3 times in exact match, 3.3 times in range match, and 1.5 times in wildcard match, respectively.

AB - Approximate searching on the primary structure (i.e., amino acid arrangement) of protein sequences is an essential part in predicting the functions and evolutionary histories of proteins. However, because proteins distant in an evolutionary history do not conserve amino acid residue arrangements, approximate searching on proteins' secondary structure is quite important in finding out distant homology. In this paper, we propose an indexing scheme for efficient approximate searching on the secondary structure of protein sequences which can be easily implemented in RDBMS. Exploiting the concept of clustering and lookahead, the proposed indexing scheme processes three types of secondary structure queries (i.e., exact match, range match, and wildcard match) very quickly. To evaluate the performance of the proposed method, we conducted extensive experiments using a set of actual protein sequences. According to the experimental results, the proposed method was proved to be faster than the existing indexing methods up to 6.3 times in exact match, 3.3 times in range match, and 1.5 times in wildcard match, respectively.

UR - http://www.scopus.com/inward/record.url?scp=26944496056&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=26944496056&partnerID=8YFLogxK

M3 - Conference contribution

SN - 3540258787

SN - 9783540258780

VL - 3488 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 237

EP - 247

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -

Seo M, Park S, Won JI. CSI: Clustered segment indexing for efficient approximate searching on the secondary structure of protein sequences. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3488 LNAI. 2005. p. 237-247. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).