Abstract
Gene prioritization is the process of determining which variants and genes identified in genetic analyses are likely to cause a disease or a variation in a phenotype. For many genes, neither in vitro nor in vivo testing is available, thus assessing their pathogenic role could be challenging, leading to false-positive or false-negative results. In this paper, we propose an innovative score of gene prioritization based on the population of interest. We introduce the concept of singleton-cohort variants (SC variant), a variant that has allele count equal to one in the cohort under study. The difference between the normalized count of SC variants in the coding region and the normalized count of SC variants in the non-coding region should give a hint regarding the level of constraints for that gene in a specific population. This scoring system is negative when there are constraints that allow the presence of SC variants only in the non-coding region; on the contrary, it is positive when there are no constraints. A complimentary score is the sum of SC variants normalized count in both coding and non-coding regions, which could be used as a proxy of positive or strong purifying selection in a specific population. Our methodology showed a high level of constraining for genes such as USP34 in all subpopulations tested (1000 G dataset). In contrast, some genes showed a high negative score only in specific populations, e.g., MYT1L in Europeans, UBR5 in East Asians, and FBXO11 in Africans.
Similar content being viewed by others
Abbreviations
- SC variants:
-
Singleton-cohort variants
- SC_cds:
-
Normalized singleton-cohort variant count in the gene coding region
- SC_ncds:
-
Normalized singleton-cohort variant count in the gene coding region
- DSC score:
-
Delta singleton-cohort variant score
- SSC score:
-
Sum singleton-cohort variant score
References
Astle WJ, Elding H, Jiang T et al (2016) The allelic landscape of human blood cell trait variation and links to common complex disease. Cell 167:1415–1429
Ayub Q, Yngvadottir B, Chen Y et al (2013) FOXP2 targets show evidence of positive selection in European populations. Am J Hum Genet 92:696–706
Bersaglieri T, Sabeti PC, Patterson N et al (2004) Genetic signatures of strong recent positive selection at the lactase gene. Am J Hum Genet 74:1111–1120
Blanchet P, Bebin M, Bruet S et al (2017) MYT1L mutations cause intellectual disability and variable obesity by dysregulating gene expression and development of the neuroendocrine hypothalamus. PLoS Genet 13:e1006957
Blomen VA, Májek P, Jae LT et al (2015) Gene essentiality and synthetic lethality in haploid human cells. Science 350:1092–1096
Booker TR, Jackson BC, Keightley PD (2017) Detecting positive selection in the genome. BMC Biol 15:98
Consortium GP (2015) A global reference for human genetic variation. Nature 526:68
Cooper GM, Shendure J (2011) Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat Rev Genet 12:628
Davydov EV, Goode DL, Sirota M et al (2010) Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput Biol 6:e1001025
de la Hoya M, Fernández JM, Tosar A et al (2003) Association between BRCA1 mutations and ratio of female to male births in offspring of families with breast cancer, ovarian cancer, or both. JAMA 290:929–931
Field Y, Boyle EA, Telis N et al (2016) Detection of human adaptation during the past 2000 years. Science 354:760–764
Frankish A, Diekhans M, Ferreira A-M et al (2018) GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res 47:D766–D773
Glaab E, Baudot A, Krasnogor N et al (2012) EnrichNet: network-based gene set enrichment analysis. Bioinformatics 28:i451–i457
Golan D, Lander ES, Rosset S (2014) Measuring missing heritability: inferring the contribution of common variants. Proc Natl Acad Sci 111:E5272–E5281
Grau J, Grosse I, Keilwagen J (2015) PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R. Bioinformatics 31:2595–2597
Guo Y, Wang M, Zhang S, et al (2018) Ubiquitin‐specific protease USP34 controls osteogenic differentiation and bone formation by regulating BMP2 signaling. EMBO J 37.
Havrilla JM, Pedersen BS, Layer RM, Quinlan AR (2019) A map of constrained coding regions in the human genome. Nat Genet 51:88–95
Hothorn T, Hornik K, Zeileis A (2006) Unbiased recursive partitioning: A conditional inference framework. J Comput Graph Stat 15:651–674
Karssen LC, van Duijn CM, Aulchenko YS (2016) The GenABEL Project for statistical genomics. F1000Research. https://doi.org/10.12688/f1000research.8733.1
Lee JJ, Wedow R, Okbay A et al (2018) Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat Genet 50:1112
Lek M, Karczewski KJ, Minikel EV et al (2016) Analysis of protein-coding genetic variation in 60,706 humans. Nature 536:285
Lou DI, McBee RM, Le UQ et al (2014) Rapid evolution of BRCA1 and BRCA2 in humans and other primates. BMC Evol Biol 14:155
Luo W, Brouwer C (2013) Pathview: an R/Bioconductor package for pathway-based data integration and visualization. Bioinformatics 29:1830–1831
McCarthy DJ, Humburg P, Kanapin A et al (2014) Choice of transcripts and software has a large effect on variant annotation. Genome Med 6:26
Meissner B, Kridel R, Lim RS et al (2013) The E3 ubiquitin ligase UBR5 is recurrently mutated in mantle cell lymphoma. Blood 121:3161–3164
Oktay K, Kim JY, Barad D, Babayev SN (2010) Association of BRCA1 mutations with occult primary ovarian insufficiency: a possible explanation for the link between infertility and breast/ovarian cancer risks. J Clin Oncol 28:240
Perry GH, Dominy NJ, Claw KG et al (2007) Diet and the evolution of human amylase gene copy number variation. Nat Genet 39:1256
Petrovski S, Gussow AB, Wang Q et al (2015) The intolerance of regulatory sequence to genetic variation predicts gene dosage sensitivity. PLoS Genet 11:e1005492
Petrovski S, Wang Q, Heinzen EL et al (2013) Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet 9:e1003709
Samocha KE, Kosmicki JA, Karczewski KJ, et al (2017) Regional missense constraint improves variant deleteriousness prediction. BioRxiv 148353
Samuels Y, Velculescu VE (2004) Oncogenic mutations of PIK3CA in human cancers. Cell Cycle 3:1221–1224
Schneider C, Kon N, Amadori L et al (2016) FBXO11 inactivation leads to abnormal germinal-center formation and lymphoproliferative disease. Blood 128:660–666
Shi H, Kichaev G, Pasaniuc B (2016) Contrasting the genetic architecture of 30 complex traits from summary association data. Am J Hum Genet 99:139–153
Tennessen JA, Bigham AW, O’Connor TD et al (2012) Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337:64–69
Weiss K, Lazar HP, Kurolap A et al (2019) The CHD4-related syndrome: a comprehensive investigation of the clinical spectrum, genotype–phenotype correlations, and molecular basis. Genet Med. https://doi.org/10.1038/s41436-019-0612-0
Yang G, Wang X, Liu B et al (2019) circ-BIRC6, a circular RNA, promotes hepatocellular carcinoma progression by targeting the miR-3918/Bcl2 axis. Cell Cycle 18:976–989
Acknowledgements
A sincere thank you to Veronika Collovati and Eleonora Bernucci for proofreading this manuscript. We would like to thank the reviewers for their insightful comments.
Funding
This research was funded by the Italian Ministry of Health (5 × 1000 to Institute for Maternal and Child Health IRCCS “Burlo Garofolo”). The funders had no role in the design of the study, data collection and analysis, decision to publish, or preparation of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Mezzavilla, M., Cocca, M., Guidolin, F. et al. A population-based approach for gene prioritization in understanding complex traits. Hum Genet 139, 647–655 (2020). https://doi.org/10.1007/s00439-020-02152-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00439-020-02152-4