Predicting target genes of non-coding regulatory variants with IRT

被引:4
作者
Wu, Zhenqin [1 ,2 ]
Ioannidis, Nilah M. [2 ]
Zou, James [2 ,3 ]
机构
[1] Stanford Univ, Dept Chem, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Biomed Data Sci, Sch Med, Stanford, CA 94305 USA
[3] Chan Zuckerberg Biohub, San Francisco, CA 94158 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
GENOME-WIDE ASSOCIATION; HUMAN PIGMENTATION; EXPRESSION; ANNOTATION; IRF4; MC1R; IDENTIFICATION; FRAMEWORK; IMPACT; LOCI;
D O I
10.1093/bioinformatics/btaa254
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Interpreting genetic variants of unknown significance (VUS) is essential in clinical applications of genome sequencing for diagnosis and personalized care. Non-coding variants remain particularly difficult to interpret, despite making up a large majority of trait associations identified in genome-wide association studies (GWAS) analyses. Predicting the regulatory effects of non-coding variants on candidate genes is a key step in evaluating their clinical significance. Here, we develop a machine-learning algorithm, Inference of Connected expression quantitative trait loci (eQTLs) (IRT), to predict the regulatory targets of non-coding variants identified in studies of eQTLs. We assemble datasets using eQTL results from the Genotype-Tissue Expression (GTEx) project and learn to separate positive and negative pairs based on annotations characterizing the variant, gene and the intermediate sequence. IRT achieves an area under the receiver operating characteristic curve (ROC-AUC) of 0.799 using random cross-validation, and 0.700 for a more stringent position-based cross-validation. Further evaluation on rare variants and experimentally validated regulatory variants shows a significant enrichment in IRT identifying the true target genes versus negative controls. In gene-ranking experiments, IRT achieves a top-1 accuracy of 50% and top-3 accuracy of 90%. Salient features, including GC-content, histone modifications and Hi-C interactions are further analyzed and visualized to illustrate their influences on predictions. IRT can be applied to any VUS of interest and each candidate nearby gene to output a score reflecting the likelihood of regulatory effect on the expression level. These scores can be used to prioritize variants and genes to assist in patient diagnosis and GWAS follow-up studies.
引用
收藏
页码:4440 / 4448
页数:9
相关论文
共 47 条
  • [1] Agarwal V., 2018, PREDICTING MRNA ABUN
  • [2] Genetic effects on gene expression across human tissues
    Aguet, Francois
    Brown, Andrew A.
    Castel, Stephane E.
    Davis, Joe R.
    He, Yuan
    Jo, Brian
    Mohammadi, Pejman
    Park, Yoson
    Parsana, Princy
    Segre, Ayellet V.
    Strober, Benjamin J.
    Zappala, Zachary
    Cummings, Beryl B.
    Gelfand, Ellen T.
    Hadley, Kane
    Huang, Katherine H.
    Lek, Monkol
    Li, Xiao
    Nedzel, Jared L.
    Nguyen, Duyen Y.
    Noble, Michael S.
    Sullivan, Timothy J.
    Tukiainen, Taru
    MacArthur, Daniel G.
    Getz, Gad
    Management, Nih Program
    Addington, Anjene
    Guan, Ping
    Koester, Susan
    Little, A. Roger
    Lockhart, Nicole C.
    Moore, Helen M.
    Rao, Abhi
    Struewing, Jeffery P.
    Volpi, Simona
    Collection, Biospecimen
    Brigham, Lori E.
    Hasz, Richard
    Hunter, Marcus
    Johns, Christopher
    Johnson, Mark
    Kopen, Gene
    Leinweber, William F.
    Lonsdale, John T.
    McDonald, Alisa
    Mestichelli, Bernadette
    Myer, Kevin
    Roe, Bryan
    Salvatore, Michael
    Shad, Saboor
    [J]. NATURE, 2017, 550 (7675) : 204 - +
  • [3] Permutation importance: a corrected feature importance measure
    Altmann, Andre
    Tolosi, Laura
    Sander, Oliver
    Lengauer, Thomas
    [J]. BIOINFORMATICS, 2010, 26 (10) : 1340 - 1347
  • [4] [Anonymous], 2018, arXiv preprint arXiv:1801.01489
  • [5] The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans
    Ardlie, Kristin G.
    DeLuca, David S.
    Segre, Ayellet V.
    Sullivan, Timothy J.
    Young, Taylor R.
    Gelfand, Ellen T.
    Trowbridge, Casandra A.
    Maller, Julian B.
    Tukiainen, Taru
    Lek, Monkol
    Ward, Lucas D.
    Kheradpour, Pouya
    Iriarte, Benjamin
    Meng, Yan
    Palmer, Cameron D.
    Esko, Tonu
    Winckler, Wendy
    Hirschhorn, Joel N.
    Kellis, Manolis
    MacArthur, Daniel G.
    Getz, Gad
    Shabalin, Andrey A.
    Li, Gen
    Zhou, Yi-Hui
    Nobel, Andrew B.
    Rusyn, Ivan
    Wright, Fred A.
    Lappalainen, Tuuli
    Ferreira, Pedro G.
    Ongen, Halit
    Rivas, Manuel A.
    Battle, Alexis
    Mostafavi, Sara
    Monlong, Jean
    Sammeth, Michael
    Mele, Marta
    Reverter, Ferran
    Goldmann, Jakob M.
    Koller, Daphne
    Guigo, Roderic
    McCarthy, Mark I.
    Dermitzakis, Emmanouil T.
    Gamazon, Eric R.
    Im, Hae Kyung
    Konkashbaev, Anuar
    Nicolae, Dan L.
    Cox, Nancy J.
    Flutre, Timothee
    Wen, Xiaoquan
    Stephens, Matthew
    [J]. SCIENCE, 2015, 348 (6235) : 648 - 660
  • [6] Identification of Susceptibility Loci for Cutaneous Squamous Cell Carcinoma
    Asgari, Maryam M.
    Wang, Wei
    Ioannidis, Nilah M.
    Itnyre, Jacqueline
    Hoffmann, Thomas
    Jorgenson, Eric
    Whittemore, Alice S.
    [J]. JOURNAL OF INVESTIGATIVE DERMATOLOGY, 2016, 136 (05) : 930 - 937
  • [7] High-resolution profiling of histone methylations in the human genome
    Barski, Artern
    Cuddapah, Suresh
    Cui, Kairong
    Roh, Tae-Young
    Schones, Dustin E.
    Wang, Zhibin
    Wei, Gang
    Chepelev, Iouri
    Zhao, Keji
    [J]. CELL, 2007, 129 (04) : 823 - 837
  • [8] Interactions Between HERC2, OCA2 and MC1R May Influence Human Pigmentation Phenotype
    Branicki, Wojciech
    Brudnik, Urszula
    Wojas-Pelc, Anna
    [J]. ANNALS OF HUMAN GENETICS, 2009, 73 : 160 - 170
  • [9] Breiman L., 2001, Mach. Learn., V45, P5
  • [10] Reconstruction of enhancer-target networks in 935 samples of human primary cells, tissues and cell lines
    Cao, Qin
    Anyansi, Christine
    Hu, Xihao
    Xu, Liangliang
    Xiong, Lei
    Tang, Wenshu
    Mok, Myth T. S.
    Cheng, Chao
    Fan, Xiaodan
    Gerstein, Mark
    Cheng, Alfred S. L.
    Yip, Kevin Y.
    [J]. NATURE GENETICS, 2017, 49 (10) : 1428 - +