Over the past several decades, biologists have conducted numerous studies examining both general and specific functions of proteins. Generally, if similarities in either the structure or sequence of amino acids exist for two proteins, then a common biological function is expected. Protein function is determined primarily based on the structure rather than the sequence of amino acids. The algorithm for protein structure alignment is an essential tool for the research. The quality of the algorithm depends on the quality of the similarity measure that is used, and the similarity measure is an objective function used to determine the best alignment. However, none of existing similarity measures became golden standard because of their individual strength and weakness. They require excessive filtering to find a single alignment. In this paper, we introduce a new strategy that finds not a single alignment, but multiple alignments with different lengths. This method has obvious benefits of high quality alignment. However, this novel method leads to a new problem that the running time for this method is considerably longer than that for methods that find only a single alignment. To address this problem, we propose algorithms that can locate a common region (CORE) of multiple alignment candidates, and can then extend the CORE into multiple alignments. Because the CORE can be defined from a final alignment, we introduce CORE*that is similar to CORE and propose an algorithm to identify the CORE*. By adopting CORE*and dynamic programming, our proposed method produces multiple alignments of various lengths with higher accuracy than previous methods. In the experiments, the alignments identified by our algorithm are longer than those obtained by TM-align by 17 % and 15.48 %, on average, when the comparison is conducted at the level of super-family and fold, respectively.
Bibliographical noteFunding Information:
Regular Paper This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology of Korea under Grant No.2012R1A1A3013084. The preliminary version of the paper was published in the Proceedings of EDB2012. ∗Corresponding Author ©2013 Springer Science + Business Media, LLC & Science Press, China
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- Hardware and Architecture
- Computer Science Applications
- Computational Theory and Mathematics