BulkAligner

A novel sequence alignment algorithm based on graph theory and Trinity

Junsu Lee, Yunku Yeu, Hongchan Roh, Youngmi Yoon, Sang Hyun Park

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Sequence alignment is a widely-used tool in genomics. With the development of next generation sequencing (NGS) technology, the production of sequence read data has recently increased. A number of read alignment algorithms for handling NGS data have been developed. However, these algorithms suffer from a trade-off between the throughput and alignment quality, due to the large computational costs for processing repeat reads. Conversely, alignment algorithms with distributed systems such as Hadoop and Trinity can obtain a better throughput than existing algorithms on single machine without compromising the alignment quality. In this paper, we suggest BulkAligner, a novel sequence alignment algorithm on the graph-based in-memory distributed system Trinity. We covert the original reference sequence into graph form and perform sequence alignment by finding the longest paths on the graph. Our experimental results show that BulkAligner has at least an 1.8× and up to 57× better throughput with the same, or higher quality than existing algorithms with Hadoop. We analyze the scalability and show that we can obtain a better throughput by simply adding machines.

Original languageEnglish
Pages (from-to)120-133
Number of pages14
JournalInformation sciences
Volume303
DOIs
Publication statusPublished - 2015 May 10

Fingerprint

Sequence Alignment
Graph theory
Alignment
Throughput
Sequencing
Distributed Systems
Graph in graph theory
Longest Path
Single Machine
Genomics
Computational Cost
Scalability
Trade-offs
Data storage equipment
Experimental Results
Processing
Graph
Costs

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Theoretical Computer Science
  • Software
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Cite this

Lee, Junsu ; Yeu, Yunku ; Roh, Hongchan ; Yoon, Youngmi ; Park, Sang Hyun. / BulkAligner : A novel sequence alignment algorithm based on graph theory and Trinity. In: Information sciences. 2015 ; Vol. 303. pp. 120-133.
@article{a89b3a661ade41dcb7d90a669856e7b8,
title = "BulkAligner: A novel sequence alignment algorithm based on graph theory and Trinity",
abstract = "Sequence alignment is a widely-used tool in genomics. With the development of next generation sequencing (NGS) technology, the production of sequence read data has recently increased. A number of read alignment algorithms for handling NGS data have been developed. However, these algorithms suffer from a trade-off between the throughput and alignment quality, due to the large computational costs for processing repeat reads. Conversely, alignment algorithms with distributed systems such as Hadoop and Trinity can obtain a better throughput than existing algorithms on single machine without compromising the alignment quality. In this paper, we suggest BulkAligner, a novel sequence alignment algorithm on the graph-based in-memory distributed system Trinity. We covert the original reference sequence into graph form and perform sequence alignment by finding the longest paths on the graph. Our experimental results show that BulkAligner has at least an 1.8× and up to 57× better throughput with the same, or higher quality than existing algorithms with Hadoop. We analyze the scalability and show that we can obtain a better throughput by simply adding machines.",
author = "Junsu Lee and Yunku Yeu and Hongchan Roh and Youngmi Yoon and Park, {Sang Hyun}",
year = "2015",
month = "5",
day = "10",
doi = "10.1016/j.ins.2015.01.011",
language = "English",
volume = "303",
pages = "120--133",
journal = "Information Sciences",
issn = "0020-0255",
publisher = "Elsevier Inc.",

}

BulkAligner : A novel sequence alignment algorithm based on graph theory and Trinity. / Lee, Junsu; Yeu, Yunku; Roh, Hongchan; Yoon, Youngmi; Park, Sang Hyun.

In: Information sciences, Vol. 303, 10.05.2015, p. 120-133.

Research output: Contribution to journalArticle

TY - JOUR

T1 - BulkAligner

T2 - A novel sequence alignment algorithm based on graph theory and Trinity

AU - Lee, Junsu

AU - Yeu, Yunku

AU - Roh, Hongchan

AU - Yoon, Youngmi

AU - Park, Sang Hyun

PY - 2015/5/10

Y1 - 2015/5/10

N2 - Sequence alignment is a widely-used tool in genomics. With the development of next generation sequencing (NGS) technology, the production of sequence read data has recently increased. A number of read alignment algorithms for handling NGS data have been developed. However, these algorithms suffer from a trade-off between the throughput and alignment quality, due to the large computational costs for processing repeat reads. Conversely, alignment algorithms with distributed systems such as Hadoop and Trinity can obtain a better throughput than existing algorithms on single machine without compromising the alignment quality. In this paper, we suggest BulkAligner, a novel sequence alignment algorithm on the graph-based in-memory distributed system Trinity. We covert the original reference sequence into graph form and perform sequence alignment by finding the longest paths on the graph. Our experimental results show that BulkAligner has at least an 1.8× and up to 57× better throughput with the same, or higher quality than existing algorithms with Hadoop. We analyze the scalability and show that we can obtain a better throughput by simply adding machines.

AB - Sequence alignment is a widely-used tool in genomics. With the development of next generation sequencing (NGS) technology, the production of sequence read data has recently increased. A number of read alignment algorithms for handling NGS data have been developed. However, these algorithms suffer from a trade-off between the throughput and alignment quality, due to the large computational costs for processing repeat reads. Conversely, alignment algorithms with distributed systems such as Hadoop and Trinity can obtain a better throughput than existing algorithms on single machine without compromising the alignment quality. In this paper, we suggest BulkAligner, a novel sequence alignment algorithm on the graph-based in-memory distributed system Trinity. We covert the original reference sequence into graph form and perform sequence alignment by finding the longest paths on the graph. Our experimental results show that BulkAligner has at least an 1.8× and up to 57× better throughput with the same, or higher quality than existing algorithms with Hadoop. We analyze the scalability and show that we can obtain a better throughput by simply adding machines.

UR - http://www.scopus.com/inward/record.url?scp=84925674981&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84925674981&partnerID=8YFLogxK

U2 - 10.1016/j.ins.2015.01.011

DO - 10.1016/j.ins.2015.01.011

M3 - Article

VL - 303

SP - 120

EP - 133

JO - Information Sciences

JF - Information Sciences

SN - 0020-0255

ER -