Haploinsufficiency, wherein a single functional copy of a gene is insufficient to maintain normal function, is a major cause of dominant disease. Human disease studies have identified several hundred haploinsufficient (HI) genes. We have compiled a map of 1,079 haplosufficient (HS) genes by systematic identification of genes unambiguously and repeatedly compromised by copy number variation among 8,458 apparently healthy individuals and contrasted the genomic, evolutionary, functional, and network properties between these HS genes and known HI genes. We found that HI genes are typically longer and have more conserved coding sequences and promoters than HS genes. HI genes exhibit higher levels of expression during early development and greater tissue specificity. Moreover, within a probabilistic human functional interaction network HI genes have more interaction partners and greater network proximity to other known HI genes. We built a predictive model on the basis of these differences and annotated 12,443 genes with their predicted probability of being haploinsufficient. We validated these predictions of haploinsufficiency by demonstrating that genes with a high predicted probability of exhibiting haploinsufficiency are enriched among genes implicated in human dominant diseases and among genes causing abnormal phenotypes in heterozygous knockout mice. We have transformed these gene-based haploinsufficiency predictions into haploinsufficiency scores for genic deletions, which we demonstrate to better discriminate between pathogenic and benign deletions than consideration of the deletion size or numbers of genes deleted. These robust predictions of haploinsufficiency support clinical interpretation of novel loss-of-function variants and prioritization of variants and genes for follow-up studies.
All Science Journal Classification (ASJC) codes
- Ecology, Evolution, Behavior and Systematics
- Molecular Biology
- Cancer Research