Systematic comparison of variant calling pipelines using gold standard personal exome variants

Sohyun Hwang, Eiru Kim, Insuk Lee, Edward M. Marcotte

Research output: Contribution to journalArticle

113 Citations (Scopus)

Abstract

The success of clinical genomics using next generation sequencing (NGS) requires the accurate and consistent identification of personal genome variants. Assorted variant calling methods have been developed, which show low concordance between their calls. Hence, a systematic comparison of the variant callers could give important guidance to NGS-based clinical genomics. Recently, a set of high-confident variant calls for one individual (NA12878) has been published by the Genome in a Bottle (GIAB) consortium, enabling performance benchmarking of different variant calling pipelines. Based on the gold standard reference variant calls from GIAB, we compared the performance of thirteen variant calling pipelines, testing combinations of three read aligners- BWA-MEM, Bowtie2, and Novoalign- and four variant callers- Genome Analysis Tool Kit HaplotypeCaller (GATK-HC), Samtools mpileup, Freebayes and Ion Proton Variant Caller (TVC), for twelve data sets for the NA12878 genome sequenced by different platforms including Illumina2000, Illumina2500, and Ion Proton, with various exome capture systems and exome coverage. We observed different biases toward specific types of SNP genotyping errors by the different variant callers. The results of our study provide useful guidelines for reliable variant identification from deep sequencing of personal genomes.

Original languageEnglish
Article number17875
JournalScientific reports
Volume5
DOIs
Publication statusPublished - 2015 Dec 7

Fingerprint

Exome
Genome
Genomics
Protons
Ions
Benchmarking
High-Throughput Nucleotide Sequencing
Single Nucleotide Polymorphism
Guidelines

All Science Journal Classification (ASJC) codes

  • General

Cite this

@article{19ace0279cf64559a468828fb7e04eee,
title = "Systematic comparison of variant calling pipelines using gold standard personal exome variants",
abstract = "The success of clinical genomics using next generation sequencing (NGS) requires the accurate and consistent identification of personal genome variants. Assorted variant calling methods have been developed, which show low concordance between their calls. Hence, a systematic comparison of the variant callers could give important guidance to NGS-based clinical genomics. Recently, a set of high-confident variant calls for one individual (NA12878) has been published by the Genome in a Bottle (GIAB) consortium, enabling performance benchmarking of different variant calling pipelines. Based on the gold standard reference variant calls from GIAB, we compared the performance of thirteen variant calling pipelines, testing combinations of three read aligners- BWA-MEM, Bowtie2, and Novoalign- and four variant callers- Genome Analysis Tool Kit HaplotypeCaller (GATK-HC), Samtools mpileup, Freebayes and Ion Proton Variant Caller (TVC), for twelve data sets for the NA12878 genome sequenced by different platforms including Illumina2000, Illumina2500, and Ion Proton, with various exome capture systems and exome coverage. We observed different biases toward specific types of SNP genotyping errors by the different variant callers. The results of our study provide useful guidelines for reliable variant identification from deep sequencing of personal genomes.",
author = "Sohyun Hwang and Eiru Kim and Insuk Lee and Marcotte, {Edward M.}",
year = "2015",
month = "12",
day = "7",
doi = "10.1038/srep17875",
language = "English",
volume = "5",
journal = "Scientific Reports",
issn = "2045-2322",
publisher = "Nature Publishing Group",

}

Systematic comparison of variant calling pipelines using gold standard personal exome variants. / Hwang, Sohyun; Kim, Eiru; Lee, Insuk; Marcotte, Edward M.

In: Scientific reports, Vol. 5, 17875, 07.12.2015.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Systematic comparison of variant calling pipelines using gold standard personal exome variants

AU - Hwang, Sohyun

AU - Kim, Eiru

AU - Lee, Insuk

AU - Marcotte, Edward M.

PY - 2015/12/7

Y1 - 2015/12/7

N2 - The success of clinical genomics using next generation sequencing (NGS) requires the accurate and consistent identification of personal genome variants. Assorted variant calling methods have been developed, which show low concordance between their calls. Hence, a systematic comparison of the variant callers could give important guidance to NGS-based clinical genomics. Recently, a set of high-confident variant calls for one individual (NA12878) has been published by the Genome in a Bottle (GIAB) consortium, enabling performance benchmarking of different variant calling pipelines. Based on the gold standard reference variant calls from GIAB, we compared the performance of thirteen variant calling pipelines, testing combinations of three read aligners- BWA-MEM, Bowtie2, and Novoalign- and four variant callers- Genome Analysis Tool Kit HaplotypeCaller (GATK-HC), Samtools mpileup, Freebayes and Ion Proton Variant Caller (TVC), for twelve data sets for the NA12878 genome sequenced by different platforms including Illumina2000, Illumina2500, and Ion Proton, with various exome capture systems and exome coverage. We observed different biases toward specific types of SNP genotyping errors by the different variant callers. The results of our study provide useful guidelines for reliable variant identification from deep sequencing of personal genomes.

AB - The success of clinical genomics using next generation sequencing (NGS) requires the accurate and consistent identification of personal genome variants. Assorted variant calling methods have been developed, which show low concordance between their calls. Hence, a systematic comparison of the variant callers could give important guidance to NGS-based clinical genomics. Recently, a set of high-confident variant calls for one individual (NA12878) has been published by the Genome in a Bottle (GIAB) consortium, enabling performance benchmarking of different variant calling pipelines. Based on the gold standard reference variant calls from GIAB, we compared the performance of thirteen variant calling pipelines, testing combinations of three read aligners- BWA-MEM, Bowtie2, and Novoalign- and four variant callers- Genome Analysis Tool Kit HaplotypeCaller (GATK-HC), Samtools mpileup, Freebayes and Ion Proton Variant Caller (TVC), for twelve data sets for the NA12878 genome sequenced by different platforms including Illumina2000, Illumina2500, and Ion Proton, with various exome capture systems and exome coverage. We observed different biases toward specific types of SNP genotyping errors by the different variant callers. The results of our study provide useful guidelines for reliable variant identification from deep sequencing of personal genomes.

UR - http://www.scopus.com/inward/record.url?scp=84949591330&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84949591330&partnerID=8YFLogxK

U2 - 10.1038/srep17875

DO - 10.1038/srep17875

M3 - Article

C2 - 26639839

AN - SCOPUS:84949591330

VL - 5

JO - Scientific Reports

JF - Scientific Reports

SN - 2045-2322

M1 - 17875

ER -