An important goal of the Human Proteome Organization (HUPO) Chromosome-centric Human Proteome Project (C-HPP) is to correctly define the number of canonical proteins encoded by their cognate open reading frames on each chromosome in the human genome. When identified with high confidence of protein evidence (PE), such proteins are termed PE1 proteins in the online database resource, neXtProt. However, proteins that have not been identified unequivocally at the protein level but that have other evidence suggestive of their existence (PE2-4) are termed missing proteins (MPs). The number of MPs has been reduced from 5511 in 2012 to 2186 in 2018 (neXtProt 2018-01-17 release). Although the annotation of the human proteome has made significant progress, the "parts list" alone does not inform function. Indeed, 1937 proteins representing ∼10% of the human proteome have no function either annotated from experimental characterization or predicted by homology to other proteins. Specifically, these 1937 "dark proteins" of the so-called dark proteome are composed of 1260 functionally uncharacterized but identified PE1 proteins, designated as uPE1, plus 677 MPs from categories PE2-PE4, which also have no known or predicted function and are termed uMPs. At the HUPO-2017 Annual Meeting, the C-HPP officially adopted the uPE1 pilot initiative, with 14 participating international teams later committing to demonstrate the feasibility of the functional characterization of large numbers of dark proteins (CP), starting first with 50 uPE1 proteins, in a stepwise chromosome-centric organizational manner. The second aim of the feasibility phase to characterize protein (CP) functions of 50 uPE1 proteins, termed the neXt-CP50 initiative, is to utilize a variety of approaches and workflows according to individual team expertise, interest, and resources so as to enable the C-HPP to recommend experimentally proven workflows to the proteome community within 3 years. The results from this pilot will not only be the cornerstone of a larger characterization initiative but also enhance understanding of the human proteome and integrated cellular networks for the discovery of new mechanisms of pathology, mechanistically informative biomarkers, and rational drug targets.
Bibliographical noteFunding Information:
We thank proteomics scientists, leaders of public DBs, C-HPP investigators and associate members, and funding agencies. This paper is dedicated to all of the C-HPP members as well as all other related investigators who contributed their efforts and data to move this global project forward in various ways. This work was supported by grants from the Korean Ministry of Health and Welfare: [HI13C2098]-International Consortium Project and [HI16C0257] (awarded to Y.-K.P.); from SIB Swiss Institute of Bioinformatics; from the Canadian Institutes of Health Research, 7-year Foundation Grant, and a Canada Research Chair in Protease Proteomics and Systems Biology: [FDN-148408] (awarded to C.M.O.); and National Institutes of Health P30 ES017885 and U24CA210967 (G.S.O.).
All Science Journal Classification (ASJC) codes