GenomewidePDB 2.0

A newly upgraded versatile proteogenomic database for the chromosome-centric human proteome project

Seul Ki Jeong, William S. Hancock, Young-Ki Paik

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

Since the launch of the Chromosome-centric Human Proteome Project (C-HPP) in 2012, the number of "missing" proteins has fallen to 2932, down from ∼5932 since the number was first counted in 2011. We compared the characteristics of missing proteins with those of already annotated proteins with respect to transcriptional expression pattern and the time periods in which newly identified proteins were annotated. We learned that missing proteins commonly exhibit lower levels of transcriptional expression and less tissue-specific expression compared with already annotated proteins. This makes it more difficult to identify missing proteins as time goes on. One of the C-HPP goals is to identify alternative spliced product of proteins (ASPs), which are usually difficult to find by shot-gun proteomic methods due to their sequence similarities with the representative proteins. To resolve this problem, it may be necessary to use a targeted proteomics approach (e.g., selected and multiple reaction monitoring [S/MRM] assays) and an innovative bioinformatics platform that enables the selection of target peptides for rarely expressed missing proteins or ASPs. Given that the success of efforts to identify missing proteins may rely on more informative public databases, it was necessary to upgrade the available integrative databases. To this end, we attempted to improve the features and utility of GenomewidePDB by integrating transcriptomic information (e.g., alternatively spliced transcripts), annotated peptide information, and an advanced search interface that can find proteins of interest when applying a targeted proteomics strategy. This upgraded version of the database, GenomewidePDB 2.0, may not only expedite identification of the remaining missing proteins but also enhance the exchange of information among the proteome community. GenomewidePDB 2.0 is available publicly at http://genomewidepdb.proteomix.org/.

Original languageEnglish
Pages (from-to)3710-3719
Number of pages10
JournalJournal of Proteome Research
Volume14
Issue number9
DOIs
Publication statusPublished - 2015 Jan 1

Fingerprint

Human Chromosomes
Proteome
Chromosomes
Databases
Proteins
Proteomics
Proteogenomics
Peptides
Firearms
Bioinformatics
Computational Biology
Assays

All Science Journal Classification (ASJC) codes

  • Biochemistry
  • Chemistry(all)

Cite this

@article{41ef7e2b174c4fd1ad79ce7548ad1d0c,
title = "GenomewidePDB 2.0: A newly upgraded versatile proteogenomic database for the chromosome-centric human proteome project",
abstract = "Since the launch of the Chromosome-centric Human Proteome Project (C-HPP) in 2012, the number of {"}missing{"} proteins has fallen to 2932, down from ∼5932 since the number was first counted in 2011. We compared the characteristics of missing proteins with those of already annotated proteins with respect to transcriptional expression pattern and the time periods in which newly identified proteins were annotated. We learned that missing proteins commonly exhibit lower levels of transcriptional expression and less tissue-specific expression compared with already annotated proteins. This makes it more difficult to identify missing proteins as time goes on. One of the C-HPP goals is to identify alternative spliced product of proteins (ASPs), which are usually difficult to find by shot-gun proteomic methods due to their sequence similarities with the representative proteins. To resolve this problem, it may be necessary to use a targeted proteomics approach (e.g., selected and multiple reaction monitoring [S/MRM] assays) and an innovative bioinformatics platform that enables the selection of target peptides for rarely expressed missing proteins or ASPs. Given that the success of efforts to identify missing proteins may rely on more informative public databases, it was necessary to upgrade the available integrative databases. To this end, we attempted to improve the features and utility of GenomewidePDB by integrating transcriptomic information (e.g., alternatively spliced transcripts), annotated peptide information, and an advanced search interface that can find proteins of interest when applying a targeted proteomics strategy. This upgraded version of the database, GenomewidePDB 2.0, may not only expedite identification of the remaining missing proteins but also enhance the exchange of information among the proteome community. GenomewidePDB 2.0 is available publicly at http://genomewidepdb.proteomix.org/.",
author = "Jeong, {Seul Ki} and Hancock, {William S.} and Young-Ki Paik",
year = "2015",
month = "1",
day = "1",
doi = "10.1021/acs.jproteome.5b00541",
language = "English",
volume = "14",
pages = "3710--3719",
journal = "Journal of Proteome Research",
issn = "1535-3893",
publisher = "American Chemical Society",
number = "9",

}

GenomewidePDB 2.0 : A newly upgraded versatile proteogenomic database for the chromosome-centric human proteome project. / Jeong, Seul Ki; Hancock, William S.; Paik, Young-Ki.

In: Journal of Proteome Research, Vol. 14, No. 9, 01.01.2015, p. 3710-3719.

Research output: Contribution to journalArticle

TY - JOUR

T1 - GenomewidePDB 2.0

T2 - A newly upgraded versatile proteogenomic database for the chromosome-centric human proteome project

AU - Jeong, Seul Ki

AU - Hancock, William S.

AU - Paik, Young-Ki

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Since the launch of the Chromosome-centric Human Proteome Project (C-HPP) in 2012, the number of "missing" proteins has fallen to 2932, down from ∼5932 since the number was first counted in 2011. We compared the characteristics of missing proteins with those of already annotated proteins with respect to transcriptional expression pattern and the time periods in which newly identified proteins were annotated. We learned that missing proteins commonly exhibit lower levels of transcriptional expression and less tissue-specific expression compared with already annotated proteins. This makes it more difficult to identify missing proteins as time goes on. One of the C-HPP goals is to identify alternative spliced product of proteins (ASPs), which are usually difficult to find by shot-gun proteomic methods due to their sequence similarities with the representative proteins. To resolve this problem, it may be necessary to use a targeted proteomics approach (e.g., selected and multiple reaction monitoring [S/MRM] assays) and an innovative bioinformatics platform that enables the selection of target peptides for rarely expressed missing proteins or ASPs. Given that the success of efforts to identify missing proteins may rely on more informative public databases, it was necessary to upgrade the available integrative databases. To this end, we attempted to improve the features and utility of GenomewidePDB by integrating transcriptomic information (e.g., alternatively spliced transcripts), annotated peptide information, and an advanced search interface that can find proteins of interest when applying a targeted proteomics strategy. This upgraded version of the database, GenomewidePDB 2.0, may not only expedite identification of the remaining missing proteins but also enhance the exchange of information among the proteome community. GenomewidePDB 2.0 is available publicly at http://genomewidepdb.proteomix.org/.

AB - Since the launch of the Chromosome-centric Human Proteome Project (C-HPP) in 2012, the number of "missing" proteins has fallen to 2932, down from ∼5932 since the number was first counted in 2011. We compared the characteristics of missing proteins with those of already annotated proteins with respect to transcriptional expression pattern and the time periods in which newly identified proteins were annotated. We learned that missing proteins commonly exhibit lower levels of transcriptional expression and less tissue-specific expression compared with already annotated proteins. This makes it more difficult to identify missing proteins as time goes on. One of the C-HPP goals is to identify alternative spliced product of proteins (ASPs), which are usually difficult to find by shot-gun proteomic methods due to their sequence similarities with the representative proteins. To resolve this problem, it may be necessary to use a targeted proteomics approach (e.g., selected and multiple reaction monitoring [S/MRM] assays) and an innovative bioinformatics platform that enables the selection of target peptides for rarely expressed missing proteins or ASPs. Given that the success of efforts to identify missing proteins may rely on more informative public databases, it was necessary to upgrade the available integrative databases. To this end, we attempted to improve the features and utility of GenomewidePDB by integrating transcriptomic information (e.g., alternatively spliced transcripts), annotated peptide information, and an advanced search interface that can find proteins of interest when applying a targeted proteomics strategy. This upgraded version of the database, GenomewidePDB 2.0, may not only expedite identification of the remaining missing proteins but also enhance the exchange of information among the proteome community. GenomewidePDB 2.0 is available publicly at http://genomewidepdb.proteomix.org/.

UR - http://www.scopus.com/inward/record.url?scp=84941122690&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84941122690&partnerID=8YFLogxK

U2 - 10.1021/acs.jproteome.5b00541

DO - 10.1021/acs.jproteome.5b00541

M3 - Article

VL - 14

SP - 3710

EP - 3719

JO - Journal of Proteome Research

JF - Journal of Proteome Research

SN - 1535-3893

IS - 9

ER -