The advent of ultra-high-throughput sequencing technology produces an enormous amount of bio-sequence information. Also, the current advances in the bio-industry bring forward the era of personalized medicine using individual genome information. However, the analysis of massive number of bio-sequences requires large storage, so that analysis sometimes needs supercomputer and novel software that can handle such volume of sequence information. For that type of analysis, several sequence match algorithms have been devised in terms of alignment and assembly, which are fundamental for analyzing bio-sequences. Those algorithms regard nucleotide sequences as strings and compare characters one-by-one during analysis of sequences. They use hash index tables, de Bruijn graph, Burrows-Wheeler transform method, and so on. In this paper, for time and space efficient DNA searching, we propose a simple algorithm that transforms base sequence into k-mer integer array and then we analyze the integer array transformed by unit search operator and non-unit search operator, resulting in a storage space reduction of about 0.28 fold. Furthermore, based on the proposed algorithm, we have developed a sequence analysis program called CalcGen assembler, and show the usefulness of the program with several experiments.
|Number of pages||7|
|Journal||Procedia Computer Science|
|Publication status||Published - 2013 Jan 1|
|Event||4th International Conference on Computational Systems-Biology and Bioinformatics, CSBio 2013 - Seoul, Korea, Republic of|
Duration: 2013 Nov 7 → 2013 Nov 9
All Science Journal Classification (ASJC) codes
- Computer Science(all)