Using expression quantitative trait loci data and graph-embedded neural networks to uncover genotype-phenotype interactions

被引:1
作者
Guo, Xinpeng [1 ,2 ]
Han, Jinyu [3 ]
Song, Yafei [2 ]
Yin, Zhilei [1 ]
Liu, Shuaichen [4 ]
Shang, Xuequn [1 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci & Engn, Xian, Peoples R China
[2] Air Force Engn Univ, Sch Air & Missile Def, Xian, Peoples R China
[3] Changan Univ, Sch Econ & Management, Xian, Peoples R China
[4] Northwestern Polytech Univ, Sch Marine Sci & Technol, Xian, Peoples R China
基金
中国国家自然科学基金;
关键词
eQTL; expression quantitative trait loci; graph-embedded deep neural network; genotype-phenotype; SNP; gene; INTEGRATION; GWAS;
D O I
10.3389/fgene.2022.921775
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Motivation: A central goal of current biology is to establish a complete functional link between the genotype and phenotype, known as the so-called genotype-phenotype map. With the continuous development of high-throughput technology and the decline in sequencing costs, multi-omics analysis has become more widely employed. While this gives us new opportunities to uncover the correlation mechanisms between single-nucleotide polymorphism (SNP), genes, and phenotypes, multi-omics still faces certain challenges, specifically: 1) When the sample size is large enough, the number of omics types is often not large enough to meet the requirements of multi-omics analysis; 2) each omics' internal correlations are often unclear, such as the correlation between genes in genomics; 3) when analyzing a large number of traits (p), the sample size (n) is often smaller than p, n << p, hindering the application of machine learning methods in the classification of disease outcomes. Results: To solve these issues with multi-omics and build a robust classification model, we propose a graph-embedded deep neural network (G-EDNN) based on expression quantitative trait loci (eQTL) data, which achieves sparse connectivity between network layers to prevent overfitting. The correlation within each omics is also considered such that the model more closely resembles biological reality. To verify the capabilities of this method, we conducted experimental analysis using the GSE28127 and GSE95496 data sets from the Gene Expression Omnibus (GEO) database, tested various neural network architectures, and used prior data for feature selection and graph embedding. Results show that the proposed method could achieve a high classification accuracy and easy-to-interpret feature selection. This method represents an extended application of genotype-phenotype association analysis in deep learning networks.
引用
收藏
页数:10
相关论文
共 45 条
  • [1] Discovery and Opportunities With Integrative Analytics Using Multiple-Omics Data
    Athreya, Arjun P.
    Lazaridis, Konstantinos N.
    [J]. HEPATOLOGY, 2021, 74 (02) : 1081 - 1087
  • [2] Wavelet Screening: a novel approach to analyzing GWAS data
    Denault, William R. P.
    Gjessing, Hakon K.
    Juodakis, Julius
    Jacobsson, Bo
    Jugessur, Astanand
    [J]. BMC BIOINFORMATICS, 2021, 22 (01)
  • [3] Network-based integration of multi-omics data for prioritizing cancer genes
    Dimitrakopoulos, Christos
    Hindupur, Sravanth Kumar
    Haefliger, Luca
    Behr, Jonas
    Montazeri, Hesam
    Hall, Michael N.
    Beerenwinkel, Niko
    [J]. BIOINFORMATICS, 2018, 34 (14) : 2441 - 2448
  • [4] Evaluation and comparison of multi-omics data integration methods for cancer subtyping
    Duan, Ran
    Gao, Lin
    Gao, Yong
    Hu, Yuxuan
    Xu, Han
    Huang, Mingfeng
    Song, Kuo
    Wang, Hongda
    Dong, Yongqiang
    Jiang, Chaoqun
    Zhang, Chenxing
    Jia, Songwei
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2021, 17 (08)
  • [5] Gene Expression Omnibus: NCBI gene expression and hybridization array data repository
    Edgar, R
    Domrachev, M
    Lash, AE
    [J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (01) : 207 - 210
  • [6] simGWAS: a fast method for simulation of large scale case-control GWAS summary statistics
    Fortune, Mary D.
    Wallace, Chris
    [J]. BIOINFORMATICS, 2019, 35 (11) : 1901 - 1906
  • [7] E-MAGMA: an eQTL-informed method to identify risk genes using genome-wide association study summary statistics
    Gerring, Zachary F.
    Mina-Vargas, Angela
    Gamazon, Eric R.
    Derks, Eske M.
    [J]. BIOINFORMATICS, 2021, 37 (16) : 2245 - 2249
  • [8] PICKLE 2.0: A human protein-protein interaction meta-database employing data integration via genetic information ontology
    Gioutlakis, Aris
    Klapa, Maria I.
    Moschonas, Nicholas K.
    [J]. PLOS ONE, 2017, 12 (10):
  • [9] Goodfellow I, 2016, ADAPT COMPUT MACH LE, P1
  • [10] Integrative omics of schizophrenia: from genetic determinants to clinical classification and risk prediction
    Guan, Fanglin
    Ni, Tong
    Zhu, Weili
    Williams, L. Keoki
    Cui, Long-Biao
    Li, Ming
    Tubbs, Justin
    Sham, Pak-Chung
    Gui, Hongsheng
    [J]. MOLECULAR PSYCHIATRY, 2022, 27 (01) : 113 - 126