A chromosome-centric human proteome project (C-HPP) to characterize the sets of proteins encoded in chromosome 17

Suli Liu, Hogune Im, Amos Bairoch, Massimo Cristofanilli, Rui Chen, Eric W. Deutsch, Stephen Dalton, David Fenyo, Susan Fanayan, Chris Gates, Pascale Gaudet, Marina Hincapie, Samir Hanash, Hoguen Kim, Seul Ki Jeong, Emma Lundberg, George Mias, Rajasree Menon, Zhaomei Mu, Edouard Nice & 11 others Young-Ki Paik, Mathias Uhlen, Lance Wells, Shiaw Lin Wu, Fangfei Yan, Fan Zhang, Yue Zhang, Michael Snyder, Gilbert S. Omenn, Ronald C. Beavis, William S. Hancock

Research output: Contribution to journalReview article

30 Citations (Scopus)

Abstract

We report progress assembling the parts list for chromosome 17 and illustrate the various processes that we have developed to integrate available data from diverse genomic and proteomic knowledge bases. As primary resources, we have used GPMDB, neXtProt, PeptideAtlas, Human Protein Atlas (HPA), and GeneCards. All sites share the common resource of Ensembl for the genome modeling information. We have defined the chromosome 17 parts list with the following information: 1169 protein-coding genes, the numbers of proteins confidently identified by various experimental approaches as documented in GPMDB, neXtProt, PeptideAtlas, and HPA, examples of typical data sets obtained by RNASeq and proteomic studies of epithelial derived tumor cell lines (disease proteome) and a normal proteome (peripheral mononuclear cells), reported evidence of post-translational modifications, and examples of alternative splice variants (ASVs). We have constructed a list of the 59 missing proteins as well as 201 proteins that have inconclusive mass spectrometric (MS) identifications. In this report we have defined a process to establish a baseline for the incorporation of new evidence on protein identification and characterization as well as related information from transcriptome analyses. This initial list of missing proteins that will guide the selection of appropriate samples for discovery studies as well as antibody reagents. Also we have illustrated the significant diversity of protein variants (including post-translational modifications, PTMs) using regions on chromosome 17 that contain important oncogenes. We emphasize the need for mandated deposition of proteomics data in public databases, the further development of improved PTM, ASV, and single nucleotide variant (SNV) databases, and the construction of Web sites that can integrate and regularly update such information. In addition, we describe the distribution of both clustered and scattered sets of protein families on the chromosome. Since chromosome 17 is rich in cancer-associated genes, we have focused the clustering of cancer-associated genes in such genomic regions and have used the ERBB2 amplicon as an example of the value of a proteogenomic approach in which one integrates transcriptomic with proteomic information and captures evidence of coexpression through coordinated regulation.

Original languageEnglish
Pages (from-to)45-57
Number of pages13
JournalJournal of Proteome Research
Volume12
Issue number1
DOIs
Publication statusPublished - 2013 Jan 4

Fingerprint

Chromosomes, Human, Pair 17
Human Chromosomes
Proteome
Chromosomes
Proteins
Proteomics
Post Translational Protein Processing
Genes
Atlases
Neoplasm Genes
Databases
Knowledge Bases
Gene Expression Profiling
Tumor Cell Line
Oncogenes
Cluster Analysis
Websites
Tumors
Nucleotides
Cells

All Science Journal Classification (ASJC) codes

  • Biochemistry
  • Chemistry(all)

Cite this

Liu, S., Im, H., Bairoch, A., Cristofanilli, M., Chen, R., Deutsch, E. W., ... Hancock, W. S. (2013). A chromosome-centric human proteome project (C-HPP) to characterize the sets of proteins encoded in chromosome 17. Journal of Proteome Research, 12(1), 45-57. https://doi.org/10.1021/pr300985j
Liu, Suli ; Im, Hogune ; Bairoch, Amos ; Cristofanilli, Massimo ; Chen, Rui ; Deutsch, Eric W. ; Dalton, Stephen ; Fenyo, David ; Fanayan, Susan ; Gates, Chris ; Gaudet, Pascale ; Hincapie, Marina ; Hanash, Samir ; Kim, Hoguen ; Jeong, Seul Ki ; Lundberg, Emma ; Mias, George ; Menon, Rajasree ; Mu, Zhaomei ; Nice, Edouard ; Paik, Young-Ki ; Uhlen, Mathias ; Wells, Lance ; Wu, Shiaw Lin ; Yan, Fangfei ; Zhang, Fan ; Zhang, Yue ; Snyder, Michael ; Omenn, Gilbert S. ; Beavis, Ronald C. ; Hancock, William S. / A chromosome-centric human proteome project (C-HPP) to characterize the sets of proteins encoded in chromosome 17. In: Journal of Proteome Research. 2013 ; Vol. 12, No. 1. pp. 45-57.
@article{bdfe3f448ebe46f0808aaa24484b2766,
title = "A chromosome-centric human proteome project (C-HPP) to characterize the sets of proteins encoded in chromosome 17",
abstract = "We report progress assembling the parts list for chromosome 17 and illustrate the various processes that we have developed to integrate available data from diverse genomic and proteomic knowledge bases. As primary resources, we have used GPMDB, neXtProt, PeptideAtlas, Human Protein Atlas (HPA), and GeneCards. All sites share the common resource of Ensembl for the genome modeling information. We have defined the chromosome 17 parts list with the following information: 1169 protein-coding genes, the numbers of proteins confidently identified by various experimental approaches as documented in GPMDB, neXtProt, PeptideAtlas, and HPA, examples of typical data sets obtained by RNASeq and proteomic studies of epithelial derived tumor cell lines (disease proteome) and a normal proteome (peripheral mononuclear cells), reported evidence of post-translational modifications, and examples of alternative splice variants (ASVs). We have constructed a list of the 59 missing proteins as well as 201 proteins that have inconclusive mass spectrometric (MS) identifications. In this report we have defined a process to establish a baseline for the incorporation of new evidence on protein identification and characterization as well as related information from transcriptome analyses. This initial list of missing proteins that will guide the selection of appropriate samples for discovery studies as well as antibody reagents. Also we have illustrated the significant diversity of protein variants (including post-translational modifications, PTMs) using regions on chromosome 17 that contain important oncogenes. We emphasize the need for mandated deposition of proteomics data in public databases, the further development of improved PTM, ASV, and single nucleotide variant (SNV) databases, and the construction of Web sites that can integrate and regularly update such information. In addition, we describe the distribution of both clustered and scattered sets of protein families on the chromosome. Since chromosome 17 is rich in cancer-associated genes, we have focused the clustering of cancer-associated genes in such genomic regions and have used the ERBB2 amplicon as an example of the value of a proteogenomic approach in which one integrates transcriptomic with proteomic information and captures evidence of coexpression through coordinated regulation.",
author = "Suli Liu and Hogune Im and Amos Bairoch and Massimo Cristofanilli and Rui Chen and Deutsch, {Eric W.} and Stephen Dalton and David Fenyo and Susan Fanayan and Chris Gates and Pascale Gaudet and Marina Hincapie and Samir Hanash and Hoguen Kim and Jeong, {Seul Ki} and Emma Lundberg and George Mias and Rajasree Menon and Zhaomei Mu and Edouard Nice and Young-Ki Paik and Mathias Uhlen and Lance Wells and Wu, {Shiaw Lin} and Fangfei Yan and Fan Zhang and Yue Zhang and Michael Snyder and Omenn, {Gilbert S.} and Beavis, {Ronald C.} and Hancock, {William S.}",
year = "2013",
month = "1",
day = "4",
doi = "10.1021/pr300985j",
language = "English",
volume = "12",
pages = "45--57",
journal = "Journal of Proteome Research",
issn = "1535-3893",
publisher = "American Chemical Society",
number = "1",

}

Liu, S, Im, H, Bairoch, A, Cristofanilli, M, Chen, R, Deutsch, EW, Dalton, S, Fenyo, D, Fanayan, S, Gates, C, Gaudet, P, Hincapie, M, Hanash, S, Kim, H, Jeong, SK, Lundberg, E, Mias, G, Menon, R, Mu, Z, Nice, E, Paik, Y-K, Uhlen, M, Wells, L, Wu, SL, Yan, F, Zhang, F, Zhang, Y, Snyder, M, Omenn, GS, Beavis, RC & Hancock, WS 2013, 'A chromosome-centric human proteome project (C-HPP) to characterize the sets of proteins encoded in chromosome 17', Journal of Proteome Research, vol. 12, no. 1, pp. 45-57. https://doi.org/10.1021/pr300985j

A chromosome-centric human proteome project (C-HPP) to characterize the sets of proteins encoded in chromosome 17. / Liu, Suli; Im, Hogune; Bairoch, Amos; Cristofanilli, Massimo; Chen, Rui; Deutsch, Eric W.; Dalton, Stephen; Fenyo, David; Fanayan, Susan; Gates, Chris; Gaudet, Pascale; Hincapie, Marina; Hanash, Samir; Kim, Hoguen; Jeong, Seul Ki; Lundberg, Emma; Mias, George; Menon, Rajasree; Mu, Zhaomei; Nice, Edouard; Paik, Young-Ki; Uhlen, Mathias; Wells, Lance; Wu, Shiaw Lin; Yan, Fangfei; Zhang, Fan; Zhang, Yue; Snyder, Michael; Omenn, Gilbert S.; Beavis, Ronald C.; Hancock, William S.

In: Journal of Proteome Research, Vol. 12, No. 1, 04.01.2013, p. 45-57.

Research output: Contribution to journalReview article

TY - JOUR

T1 - A chromosome-centric human proteome project (C-HPP) to characterize the sets of proteins encoded in chromosome 17

AU - Liu, Suli

AU - Im, Hogune

AU - Bairoch, Amos

AU - Cristofanilli, Massimo

AU - Chen, Rui

AU - Deutsch, Eric W.

AU - Dalton, Stephen

AU - Fenyo, David

AU - Fanayan, Susan

AU - Gates, Chris

AU - Gaudet, Pascale

AU - Hincapie, Marina

AU - Hanash, Samir

AU - Kim, Hoguen

AU - Jeong, Seul Ki

AU - Lundberg, Emma

AU - Mias, George

AU - Menon, Rajasree

AU - Mu, Zhaomei

AU - Nice, Edouard

AU - Paik, Young-Ki

AU - Uhlen, Mathias

AU - Wells, Lance

AU - Wu, Shiaw Lin

AU - Yan, Fangfei

AU - Zhang, Fan

AU - Zhang, Yue

AU - Snyder, Michael

AU - Omenn, Gilbert S.

AU - Beavis, Ronald C.

AU - Hancock, William S.

PY - 2013/1/4

Y1 - 2013/1/4

N2 - We report progress assembling the parts list for chromosome 17 and illustrate the various processes that we have developed to integrate available data from diverse genomic and proteomic knowledge bases. As primary resources, we have used GPMDB, neXtProt, PeptideAtlas, Human Protein Atlas (HPA), and GeneCards. All sites share the common resource of Ensembl for the genome modeling information. We have defined the chromosome 17 parts list with the following information: 1169 protein-coding genes, the numbers of proteins confidently identified by various experimental approaches as documented in GPMDB, neXtProt, PeptideAtlas, and HPA, examples of typical data sets obtained by RNASeq and proteomic studies of epithelial derived tumor cell lines (disease proteome) and a normal proteome (peripheral mononuclear cells), reported evidence of post-translational modifications, and examples of alternative splice variants (ASVs). We have constructed a list of the 59 missing proteins as well as 201 proteins that have inconclusive mass spectrometric (MS) identifications. In this report we have defined a process to establish a baseline for the incorporation of new evidence on protein identification and characterization as well as related information from transcriptome analyses. This initial list of missing proteins that will guide the selection of appropriate samples for discovery studies as well as antibody reagents. Also we have illustrated the significant diversity of protein variants (including post-translational modifications, PTMs) using regions on chromosome 17 that contain important oncogenes. We emphasize the need for mandated deposition of proteomics data in public databases, the further development of improved PTM, ASV, and single nucleotide variant (SNV) databases, and the construction of Web sites that can integrate and regularly update such information. In addition, we describe the distribution of both clustered and scattered sets of protein families on the chromosome. Since chromosome 17 is rich in cancer-associated genes, we have focused the clustering of cancer-associated genes in such genomic regions and have used the ERBB2 amplicon as an example of the value of a proteogenomic approach in which one integrates transcriptomic with proteomic information and captures evidence of coexpression through coordinated regulation.

AB - We report progress assembling the parts list for chromosome 17 and illustrate the various processes that we have developed to integrate available data from diverse genomic and proteomic knowledge bases. As primary resources, we have used GPMDB, neXtProt, PeptideAtlas, Human Protein Atlas (HPA), and GeneCards. All sites share the common resource of Ensembl for the genome modeling information. We have defined the chromosome 17 parts list with the following information: 1169 protein-coding genes, the numbers of proteins confidently identified by various experimental approaches as documented in GPMDB, neXtProt, PeptideAtlas, and HPA, examples of typical data sets obtained by RNASeq and proteomic studies of epithelial derived tumor cell lines (disease proteome) and a normal proteome (peripheral mononuclear cells), reported evidence of post-translational modifications, and examples of alternative splice variants (ASVs). We have constructed a list of the 59 missing proteins as well as 201 proteins that have inconclusive mass spectrometric (MS) identifications. In this report we have defined a process to establish a baseline for the incorporation of new evidence on protein identification and characterization as well as related information from transcriptome analyses. This initial list of missing proteins that will guide the selection of appropriate samples for discovery studies as well as antibody reagents. Also we have illustrated the significant diversity of protein variants (including post-translational modifications, PTMs) using regions on chromosome 17 that contain important oncogenes. We emphasize the need for mandated deposition of proteomics data in public databases, the further development of improved PTM, ASV, and single nucleotide variant (SNV) databases, and the construction of Web sites that can integrate and regularly update such information. In addition, we describe the distribution of both clustered and scattered sets of protein families on the chromosome. Since chromosome 17 is rich in cancer-associated genes, we have focused the clustering of cancer-associated genes in such genomic regions and have used the ERBB2 amplicon as an example of the value of a proteogenomic approach in which one integrates transcriptomic with proteomic information and captures evidence of coexpression through coordinated regulation.

UR - http://www.scopus.com/inward/record.url?scp=84874081594&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84874081594&partnerID=8YFLogxK

U2 - 10.1021/pr300985j

DO - 10.1021/pr300985j

M3 - Review article

VL - 12

SP - 45

EP - 57

JO - Journal of Proteome Research

JF - Journal of Proteome Research

SN - 1535-3893

IS - 1

ER -