Next Generation Proteomic Pipeline for Chromosome-Based Proteomic Research Using NeXtProt and GENCODE Databases

Heeyoun Hwang, Gun Wook Park, Ji Yeong Park, Hyun Kyoung Lee, Ju Yeon Lee, Ji Eun Jeong, Sung Kyu Robin Park, John R. Yates, Kyung Hoon Kwon, Young Mok Park, Hyoung Joo Lee, Young Ki Paik, Jin Young Kim, Jong Shin Yoo

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Human Proteome Project aims to map all human proteins including missing proteins as well as proteoforms with post translational modifications, alternative splicing variants (ASVs), and single amino acid variants (SAAVs). neXtProt and Ensemble databases are usually used to provide curated information on human coding genes. However, to find these proteoforms, we (Chr #11 team) first introduce a streamlined pipeline using customized and concatenated neXtProt and GENCODE originated from Ensemble, with controlled false discovery rate (FDR). Because of large sized databases used in this pipeline, we found more stringent FDR filtering (0.1% at the peptide level and 1% at the protein level) to claim novel findings, such as GENCODE ASVs and missing proteins, from human hippocampus data set (MSV000081385) and ProteomeXchange (PXD007166). Using our next generation proteomic pipeline (nextPP) with neXtProt and GENCODE databases, two missing proteins such as activity-regulated cytoskeleton-associated protein (ARC, Chr 8) and glutamate receptor ionotropic, kainite 5 (GRIK5, Chr 19) were additionally identified with two or more unique peptides from human brain tissues. Additionally, by applying the pipeline to human brain related data sets such as cortex (PXD000067 and PXD000561), spinal cord, and fetal brain (PXD000561), seven GENCODE ASVs such as ACTN4-012 (Chr.19), DPYSL2-005 (Chr.8), MPRIP-003 (Chr.17), NCAM1-013 (Chr.11), EPB41L1-017 (Chr.20), AGAP1-004 (Chr.2), and CPNE5-005 (Chr.6) were identified from two or more data sets. The identified peptides of GENCODE ASVs were mapped onto novel exon insertions, alternative translations at 5′-untranslated region, or novel protein coding sequence. Applying the pipeline to male reproductive organ related data sets, 52 GENCODE ASVs were identified from two testis (PXD000561 and PXD002179) and a spermatozoa (PXD003947) data sets. Four out of 52 GENCODE ASVs such as RAB11FIP5-008 (Chr. 2), RP13-347D8.7-001 (Chr. X), PRDX4-002 (Chr. X), and RP11-666A8.13-001 (Chr. 17) were identified in all of the three samples.

Original languageEnglish
Pages (from-to)4425-4434
Number of pages10
JournalJournal of Proteome Research
Volume16
Issue number12
DOIs
Publication statusPublished - 2017 Dec 1

Fingerprint

Chromosomes
Proteomics
Alternative Splicing
Pipelines
Databases
Research
Proteins
Brain
Peptides
Ionotropic Glutamate Receptors
AIDS-Related Complex
5' Untranslated Regions
Proteome
Post Translational Protein Processing
Cytoskeleton
Spermatozoa
Testis
Exons
Hippocampus
Spinal Cord

All Science Journal Classification (ASJC) codes

  • Biochemistry
  • Chemistry(all)

Cite this

Hwang, Heeyoun ; Park, Gun Wook ; Park, Ji Yeong ; Lee, Hyun Kyoung ; Lee, Ju Yeon ; Jeong, Ji Eun ; Park, Sung Kyu Robin ; Yates, John R. ; Kwon, Kyung Hoon ; Park, Young Mok ; Lee, Hyoung Joo ; Paik, Young Ki ; Kim, Jin Young ; Yoo, Jong Shin. / Next Generation Proteomic Pipeline for Chromosome-Based Proteomic Research Using NeXtProt and GENCODE Databases. In: Journal of Proteome Research. 2017 ; Vol. 16, No. 12. pp. 4425-4434.
@article{3afd0dec023046efa789b029bde7085f,
title = "Next Generation Proteomic Pipeline for Chromosome-Based Proteomic Research Using NeXtProt and GENCODE Databases",
abstract = "Human Proteome Project aims to map all human proteins including missing proteins as well as proteoforms with post translational modifications, alternative splicing variants (ASVs), and single amino acid variants (SAAVs). neXtProt and Ensemble databases are usually used to provide curated information on human coding genes. However, to find these proteoforms, we (Chr #11 team) first introduce a streamlined pipeline using customized and concatenated neXtProt and GENCODE originated from Ensemble, with controlled false discovery rate (FDR). Because of large sized databases used in this pipeline, we found more stringent FDR filtering (0.1{\%} at the peptide level and 1{\%} at the protein level) to claim novel findings, such as GENCODE ASVs and missing proteins, from human hippocampus data set (MSV000081385) and ProteomeXchange (PXD007166). Using our next generation proteomic pipeline (nextPP) with neXtProt and GENCODE databases, two missing proteins such as activity-regulated cytoskeleton-associated protein (ARC, Chr 8) and glutamate receptor ionotropic, kainite 5 (GRIK5, Chr 19) were additionally identified with two or more unique peptides from human brain tissues. Additionally, by applying the pipeline to human brain related data sets such as cortex (PXD000067 and PXD000561), spinal cord, and fetal brain (PXD000561), seven GENCODE ASVs such as ACTN4-012 (Chr.19), DPYSL2-005 (Chr.8), MPRIP-003 (Chr.17), NCAM1-013 (Chr.11), EPB41L1-017 (Chr.20), AGAP1-004 (Chr.2), and CPNE5-005 (Chr.6) were identified from two or more data sets. The identified peptides of GENCODE ASVs were mapped onto novel exon insertions, alternative translations at 5′-untranslated region, or novel protein coding sequence. Applying the pipeline to male reproductive organ related data sets, 52 GENCODE ASVs were identified from two testis (PXD000561 and PXD002179) and a spermatozoa (PXD003947) data sets. Four out of 52 GENCODE ASVs such as RAB11FIP5-008 (Chr. 2), RP13-347D8.7-001 (Chr. X), PRDX4-002 (Chr. X), and RP11-666A8.13-001 (Chr. 17) were identified in all of the three samples.",
author = "Heeyoun Hwang and Park, {Gun Wook} and Park, {Ji Yeong} and Lee, {Hyun Kyoung} and Lee, {Ju Yeon} and Jeong, {Ji Eun} and Park, {Sung Kyu Robin} and Yates, {John R.} and Kwon, {Kyung Hoon} and Park, {Young Mok} and Lee, {Hyoung Joo} and Paik, {Young Ki} and Kim, {Jin Young} and Yoo, {Jong Shin}",
year = "2017",
month = "12",
day = "1",
doi = "10.1021/acs.jproteome.7b00223",
language = "English",
volume = "16",
pages = "4425--4434",
journal = "Journal of Proteome Research",
issn = "1535-3893",
publisher = "American Chemical Society",
number = "12",

}

Hwang, H, Park, GW, Park, JY, Lee, HK, Lee, JY, Jeong, JE, Park, SKR, Yates, JR, Kwon, KH, Park, YM, Lee, HJ, Paik, YK, Kim, JY & Yoo, JS 2017, 'Next Generation Proteomic Pipeline for Chromosome-Based Proteomic Research Using NeXtProt and GENCODE Databases', Journal of Proteome Research, vol. 16, no. 12, pp. 4425-4434. https://doi.org/10.1021/acs.jproteome.7b00223

Next Generation Proteomic Pipeline for Chromosome-Based Proteomic Research Using NeXtProt and GENCODE Databases. / Hwang, Heeyoun; Park, Gun Wook; Park, Ji Yeong; Lee, Hyun Kyoung; Lee, Ju Yeon; Jeong, Ji Eun; Park, Sung Kyu Robin; Yates, John R.; Kwon, Kyung Hoon; Park, Young Mok; Lee, Hyoung Joo; Paik, Young Ki; Kim, Jin Young; Yoo, Jong Shin.

In: Journal of Proteome Research, Vol. 16, No. 12, 01.12.2017, p. 4425-4434.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Next Generation Proteomic Pipeline for Chromosome-Based Proteomic Research Using NeXtProt and GENCODE Databases

AU - Hwang, Heeyoun

AU - Park, Gun Wook

AU - Park, Ji Yeong

AU - Lee, Hyun Kyoung

AU - Lee, Ju Yeon

AU - Jeong, Ji Eun

AU - Park, Sung Kyu Robin

AU - Yates, John R.

AU - Kwon, Kyung Hoon

AU - Park, Young Mok

AU - Lee, Hyoung Joo

AU - Paik, Young Ki

AU - Kim, Jin Young

AU - Yoo, Jong Shin

PY - 2017/12/1

Y1 - 2017/12/1

N2 - Human Proteome Project aims to map all human proteins including missing proteins as well as proteoforms with post translational modifications, alternative splicing variants (ASVs), and single amino acid variants (SAAVs). neXtProt and Ensemble databases are usually used to provide curated information on human coding genes. However, to find these proteoforms, we (Chr #11 team) first introduce a streamlined pipeline using customized and concatenated neXtProt and GENCODE originated from Ensemble, with controlled false discovery rate (FDR). Because of large sized databases used in this pipeline, we found more stringent FDR filtering (0.1% at the peptide level and 1% at the protein level) to claim novel findings, such as GENCODE ASVs and missing proteins, from human hippocampus data set (MSV000081385) and ProteomeXchange (PXD007166). Using our next generation proteomic pipeline (nextPP) with neXtProt and GENCODE databases, two missing proteins such as activity-regulated cytoskeleton-associated protein (ARC, Chr 8) and glutamate receptor ionotropic, kainite 5 (GRIK5, Chr 19) were additionally identified with two or more unique peptides from human brain tissues. Additionally, by applying the pipeline to human brain related data sets such as cortex (PXD000067 and PXD000561), spinal cord, and fetal brain (PXD000561), seven GENCODE ASVs such as ACTN4-012 (Chr.19), DPYSL2-005 (Chr.8), MPRIP-003 (Chr.17), NCAM1-013 (Chr.11), EPB41L1-017 (Chr.20), AGAP1-004 (Chr.2), and CPNE5-005 (Chr.6) were identified from two or more data sets. The identified peptides of GENCODE ASVs were mapped onto novel exon insertions, alternative translations at 5′-untranslated region, or novel protein coding sequence. Applying the pipeline to male reproductive organ related data sets, 52 GENCODE ASVs were identified from two testis (PXD000561 and PXD002179) and a spermatozoa (PXD003947) data sets. Four out of 52 GENCODE ASVs such as RAB11FIP5-008 (Chr. 2), RP13-347D8.7-001 (Chr. X), PRDX4-002 (Chr. X), and RP11-666A8.13-001 (Chr. 17) were identified in all of the three samples.

AB - Human Proteome Project aims to map all human proteins including missing proteins as well as proteoforms with post translational modifications, alternative splicing variants (ASVs), and single amino acid variants (SAAVs). neXtProt and Ensemble databases are usually used to provide curated information on human coding genes. However, to find these proteoforms, we (Chr #11 team) first introduce a streamlined pipeline using customized and concatenated neXtProt and GENCODE originated from Ensemble, with controlled false discovery rate (FDR). Because of large sized databases used in this pipeline, we found more stringent FDR filtering (0.1% at the peptide level and 1% at the protein level) to claim novel findings, such as GENCODE ASVs and missing proteins, from human hippocampus data set (MSV000081385) and ProteomeXchange (PXD007166). Using our next generation proteomic pipeline (nextPP) with neXtProt and GENCODE databases, two missing proteins such as activity-regulated cytoskeleton-associated protein (ARC, Chr 8) and glutamate receptor ionotropic, kainite 5 (GRIK5, Chr 19) were additionally identified with two or more unique peptides from human brain tissues. Additionally, by applying the pipeline to human brain related data sets such as cortex (PXD000067 and PXD000561), spinal cord, and fetal brain (PXD000561), seven GENCODE ASVs such as ACTN4-012 (Chr.19), DPYSL2-005 (Chr.8), MPRIP-003 (Chr.17), NCAM1-013 (Chr.11), EPB41L1-017 (Chr.20), AGAP1-004 (Chr.2), and CPNE5-005 (Chr.6) were identified from two or more data sets. The identified peptides of GENCODE ASVs were mapped onto novel exon insertions, alternative translations at 5′-untranslated region, or novel protein coding sequence. Applying the pipeline to male reproductive organ related data sets, 52 GENCODE ASVs were identified from two testis (PXD000561 and PXD002179) and a spermatozoa (PXD003947) data sets. Four out of 52 GENCODE ASVs such as RAB11FIP5-008 (Chr. 2), RP13-347D8.7-001 (Chr. X), PRDX4-002 (Chr. X), and RP11-666A8.13-001 (Chr. 17) were identified in all of the three samples.

UR - http://www.scopus.com/inward/record.url?scp=85037329943&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85037329943&partnerID=8YFLogxK

U2 - 10.1021/acs.jproteome.7b00223

DO - 10.1021/acs.jproteome.7b00223

M3 - Article

C2 - 28965411

AN - SCOPUS:85037329943

VL - 16

SP - 4425

EP - 4434

JO - Journal of Proteome Research

JF - Journal of Proteome Research

SN - 1535-3893

IS - 12

ER -