Pyrosequencing of the 16S ribosomal RNA gene (16S) has become one of the most popular methods to assess microbial diversity. Pyrosequencing reads containing ambiguous bases (Ns) are generally discarded based on the assumptions of their non-sequence-dependent formation and high error rates. However, taxonomic composition differed by removal of reads with Ns. We determined whether Ns from pyrosequencing occur in a sequence-dependent manner. Our reads and the corresponding flow value data revealed occurrence of sequence-specific N errors with a common sequential pattern (a homopolymer + a few nucleotides with bases other than the homopolymer + N) and revealed that the nucleotide base of the homopolymer is the true base for the following N. Using an algorithm reflecting this sequence-dependent pattern, we corrected the Ns in the 16S (86.54%), bphD (81.37%) and nifH (81.55%) amplicon reads from a mock community with high precisions of 95.4, 96.9 and 100%, respectively the new N correction method was applicable for determining most of Ns in amplicon reads from a soil sample, resulting in reducing taxonomic biases associated with N errors and in shotgun sequencing reads from public metagenome data the method improves the accuracy and precision of microbial community analysis and genome sequencing using 454 pyrosequencing.
Bibliographical noteFunding Information:
‘‘The GAIA Project’’ program of Korea Ministry of Environment. Funding for the open access charge: Korea Ministry of Environment, The GAIA Project.
All Science Journal Classification (ASJC) codes