Rapid and robust denoising of pyrosequenced amplicons for metagenomics

Byunghan Lee, Joonhong Park, Sungroh Yoon

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Metagenomic sequencing has become a crucial tool for obtaining a gene catalogue of operational taxonomic units (OTUs) in a microbial community. High-throughput pyrosequencing is a next-generation sequencing technique very popular in microbial community analysis due to its longer read length compared to alternative methods. Computational tools are inevitable to process raw data from pyrosequencers, and in particular, noise removal is a critical data-mining step to obtain robust sequence reads. However, the slow rate of existing denoisers has bottlenecked the whole pyrosequencing process, let alone hindering efforts to improve robustness. To address these, we propose a new approach that can accelerate the denoising process substantially. By using our approach, it now takes only about 2 hours to denoise 62,873 pyrosequenced amplicons from a mixture of 91 full-length 16S rRNA clones. It would otherwise take nearly 2.5 days if existing software tools were used. Furthermore, our approach can effectively reduce overestimating the number of OTUs, producing 6.7 times fewer species-level OTUs on average than a state-of-theart alternative under the same condition. Leveraged by our approach, we hope that metagenomic sequencing will become an even more appealing tool for microbial community analysis.

Original languageEnglish
Title of host publicationProceedings - 12th IEEE International Conference on Data Mining, ICDM 2012
Pages954-959
Number of pages6
DOIs
Publication statusPublished - 2012 Dec 1
Event12th IEEE International Conference on Data Mining, ICDM 2012 - Brussels, Belgium
Duration: 2012 Dec 102012 Dec 13

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Other

Other12th IEEE International Conference on Data Mining, ICDM 2012
CountryBelgium
CityBrussels
Period12/12/1012/12/13

Fingerprint

Data mining
Genes
Throughput

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Cite this

Lee, B., Park, J., & Yoon, S. (2012). Rapid and robust denoising of pyrosequenced amplicons for metagenomics. In Proceedings - 12th IEEE International Conference on Data Mining, ICDM 2012 (pp. 954-959). [6413826] (Proceedings - IEEE International Conference on Data Mining, ICDM). https://doi.org/10.1109/ICDM.2012.68
Lee, Byunghan ; Park, Joonhong ; Yoon, Sungroh. / Rapid and robust denoising of pyrosequenced amplicons for metagenomics. Proceedings - 12th IEEE International Conference on Data Mining, ICDM 2012. 2012. pp. 954-959 (Proceedings - IEEE International Conference on Data Mining, ICDM).
@inproceedings{8187d0cd5e4f40359366e8601b812a16,
title = "Rapid and robust denoising of pyrosequenced amplicons for metagenomics",
abstract = "Metagenomic sequencing has become a crucial tool for obtaining a gene catalogue of operational taxonomic units (OTUs) in a microbial community. High-throughput pyrosequencing is a next-generation sequencing technique very popular in microbial community analysis due to its longer read length compared to alternative methods. Computational tools are inevitable to process raw data from pyrosequencers, and in particular, noise removal is a critical data-mining step to obtain robust sequence reads. However, the slow rate of existing denoisers has bottlenecked the whole pyrosequencing process, let alone hindering efforts to improve robustness. To address these, we propose a new approach that can accelerate the denoising process substantially. By using our approach, it now takes only about 2 hours to denoise 62,873 pyrosequenced amplicons from a mixture of 91 full-length 16S rRNA clones. It would otherwise take nearly 2.5 days if existing software tools were used. Furthermore, our approach can effectively reduce overestimating the number of OTUs, producing 6.7 times fewer species-level OTUs on average than a state-of-theart alternative under the same condition. Leveraged by our approach, we hope that metagenomic sequencing will become an even more appealing tool for microbial community analysis.",
author = "Byunghan Lee and Joonhong Park and Sungroh Yoon",
year = "2012",
month = "12",
day = "1",
doi = "10.1109/ICDM.2012.68",
language = "English",
isbn = "9780769549057",
series = "Proceedings - IEEE International Conference on Data Mining, ICDM",
pages = "954--959",
booktitle = "Proceedings - 12th IEEE International Conference on Data Mining, ICDM 2012",

}

Lee, B, Park, J & Yoon, S 2012, Rapid and robust denoising of pyrosequenced amplicons for metagenomics. in Proceedings - 12th IEEE International Conference on Data Mining, ICDM 2012., 6413826, Proceedings - IEEE International Conference on Data Mining, ICDM, pp. 954-959, 12th IEEE International Conference on Data Mining, ICDM 2012, Brussels, Belgium, 12/12/10. https://doi.org/10.1109/ICDM.2012.68

Rapid and robust denoising of pyrosequenced amplicons for metagenomics. / Lee, Byunghan; Park, Joonhong; Yoon, Sungroh.

Proceedings - 12th IEEE International Conference on Data Mining, ICDM 2012. 2012. p. 954-959 6413826 (Proceedings - IEEE International Conference on Data Mining, ICDM).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Rapid and robust denoising of pyrosequenced amplicons for metagenomics

AU - Lee, Byunghan

AU - Park, Joonhong

AU - Yoon, Sungroh

PY - 2012/12/1

Y1 - 2012/12/1

N2 - Metagenomic sequencing has become a crucial tool for obtaining a gene catalogue of operational taxonomic units (OTUs) in a microbial community. High-throughput pyrosequencing is a next-generation sequencing technique very popular in microbial community analysis due to its longer read length compared to alternative methods. Computational tools are inevitable to process raw data from pyrosequencers, and in particular, noise removal is a critical data-mining step to obtain robust sequence reads. However, the slow rate of existing denoisers has bottlenecked the whole pyrosequencing process, let alone hindering efforts to improve robustness. To address these, we propose a new approach that can accelerate the denoising process substantially. By using our approach, it now takes only about 2 hours to denoise 62,873 pyrosequenced amplicons from a mixture of 91 full-length 16S rRNA clones. It would otherwise take nearly 2.5 days if existing software tools were used. Furthermore, our approach can effectively reduce overestimating the number of OTUs, producing 6.7 times fewer species-level OTUs on average than a state-of-theart alternative under the same condition. Leveraged by our approach, we hope that metagenomic sequencing will become an even more appealing tool for microbial community analysis.

AB - Metagenomic sequencing has become a crucial tool for obtaining a gene catalogue of operational taxonomic units (OTUs) in a microbial community. High-throughput pyrosequencing is a next-generation sequencing technique very popular in microbial community analysis due to its longer read length compared to alternative methods. Computational tools are inevitable to process raw data from pyrosequencers, and in particular, noise removal is a critical data-mining step to obtain robust sequence reads. However, the slow rate of existing denoisers has bottlenecked the whole pyrosequencing process, let alone hindering efforts to improve robustness. To address these, we propose a new approach that can accelerate the denoising process substantially. By using our approach, it now takes only about 2 hours to denoise 62,873 pyrosequenced amplicons from a mixture of 91 full-length 16S rRNA clones. It would otherwise take nearly 2.5 days if existing software tools were used. Furthermore, our approach can effectively reduce overestimating the number of OTUs, producing 6.7 times fewer species-level OTUs on average than a state-of-theart alternative under the same condition. Leveraged by our approach, we hope that metagenomic sequencing will become an even more appealing tool for microbial community analysis.

UR - http://www.scopus.com/inward/record.url?scp=84874036155&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84874036155&partnerID=8YFLogxK

U2 - 10.1109/ICDM.2012.68

DO - 10.1109/ICDM.2012.68

M3 - Conference contribution

AN - SCOPUS:84874036155

SN - 9780769549057

T3 - Proceedings - IEEE International Conference on Data Mining, ICDM

SP - 954

EP - 959

BT - Proceedings - 12th IEEE International Conference on Data Mining, ICDM 2012

ER -

Lee B, Park J, Yoon S. Rapid and robust denoising of pyrosequenced amplicons for metagenomics. In Proceedings - 12th IEEE International Conference on Data Mining, ICDM 2012. 2012. p. 954-959. 6413826. (Proceedings - IEEE International Conference on Data Mining, ICDM). https://doi.org/10.1109/ICDM.2012.68