Using expression quantitative trait loci data and graph-embedded neural networks to uncover genotype-phenotype interactions

被引:1
作者
Guo, Xinpeng [1 ,2 ]
Han, Jinyu [3 ]
Song, Yafei [2 ]
Yin, Zhilei [1 ]
Liu, Shuaichen [4 ]
Shang, Xuequn [1 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci & Engn, Xian, Peoples R China
[2] Air Force Engn Univ, Sch Air & Missile Def, Xian, Peoples R China
[3] Changan Univ, Sch Econ & Management, Xian, Peoples R China
[4] Northwestern Polytech Univ, Sch Marine Sci & Technol, Xian, Peoples R China
基金
中国国家自然科学基金;
关键词
eQTL; expression quantitative trait loci; graph-embedded deep neural network; genotype-phenotype; SNP; gene; INTEGRATION; GWAS;
D O I
10.3389/fgene.2022.921775
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Motivation: A central goal of current biology is to establish a complete functional link between the genotype and phenotype, known as the so-called genotype-phenotype map. With the continuous development of high-throughput technology and the decline in sequencing costs, multi-omics analysis has become more widely employed. While this gives us new opportunities to uncover the correlation mechanisms between single-nucleotide polymorphism (SNP), genes, and phenotypes, multi-omics still faces certain challenges, specifically: 1) When the sample size is large enough, the number of omics types is often not large enough to meet the requirements of multi-omics analysis; 2) each omics' internal correlations are often unclear, such as the correlation between genes in genomics; 3) when analyzing a large number of traits (p), the sample size (n) is often smaller than p, n << p, hindering the application of machine learning methods in the classification of disease outcomes. Results: To solve these issues with multi-omics and build a robust classification model, we propose a graph-embedded deep neural network (G-EDNN) based on expression quantitative trait loci (eQTL) data, which achieves sparse connectivity between network layers to prevent overfitting. The correlation within each omics is also considered such that the model more closely resembles biological reality. To verify the capabilities of this method, we conducted experimental analysis using the GSE28127 and GSE95496 data sets from the Gene Expression Omnibus (GEO) database, tested various neural network architectures, and used prior data for feature selection and graph embedding. Results show that the proposed method could achieve a high classification accuracy and easy-to-interpret feature selection. This method represents an extended application of genotype-phenotype association analysis in deep learning networks.
引用
收藏
页数:10
相关论文
共 45 条
  • [11] Guo X., 2020, IPMM: Cancer subtype clustering model based on multiomics data and pathway and motif information, P560
  • [12] Linking genotype to phenotype in multi-omics data of small sample
    Guo, Xinpeng
    Song, Yafei
    Liu, Shuhui
    Gao, Meihong
    Qi, Yang
    Shang, Xuequn
    [J]. BMC GENOMICS, 2021, 22 (01)
  • [13] A unified framework for the integration of multiple hierarchical clusterings or networks from multi-source data
    Hulot, Audrey
    Laloe, Denis
    Jaffrezic, Florence
    [J]. BMC BIOINFORMATICS, 2021, 22 (01)
  • [14] Approaches to Integrating Metabolomics and Multi-Omics Data: A Primer
    Jendoubi, Takoua
    [J]. METABOLITES, 2021, 11 (03)
  • [15] Understanding Genotype-Phenotype Effects in Cancer via Network Approaches
    Kim, Yoo-Ah
    Cho, Dong-Yeon
    Przytycka, Teresa M.
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2016, 12 (03)
  • [16] Kingma DP, 2014, ADV NEUR IN, V27
  • [17] Kolen J. F., 2001, FIELD GUIDE DYNAMICA
  • [18] A graph-embedded deep feedforward network for disease outcome classification and feature selection using gene expression data
    Kong, Yunchuan
    Yu, Tianwei
    [J]. BIOINFORMATICS, 2018, 34 (21) : 3727 - 3737
  • [19] Predictive Genes in Adjacent Normal Tissue Are Preferentially Altered by sCNV during Tumorigenesis in Liver Cancer and May Rate Limiting
    Lamb, John R.
    Zhang, Chunsheng
    Xie, Tao
    Wang, Kai
    Zhang, Bin
    Hao, Ke
    Chudin, Eugene
    Fraser, Hunter B.
    Millstein, Joshua
    Ferguson, Mark
    Suver, Christine
    Ivanovska, Irena
    Scott, Martin
    Philippar, Ulrike
    Bansal, Dimple
    Zhang, Zhan
    Burchard, Julja
    Smith, Ryan
    Greenawalt, Danielle
    Cleary, Michele
    Derry, Jonathan
    Loboda, Andrey
    Watters, James
    Poon, Ronnie T. P.
    Fan, Sheung T.
    Yeung, Chun
    Lee, Nikki P. Y.
    Guinney, Justin
    Molony, Cliona
    Emilsson, Valur
    Buser-Doepner, Carolyn
    Zhu, Jun
    Friend, Stephen
    Mao, Mao
    Shaw, Peter M.
    Dai, Hongyue
    Luk, John M.
    Schadt, Eric E.
    [J]. PLOS ONE, 2011, 6 (07):
  • [20] A network-driven approach for genome-wide association mapping
    Lee, Seunghak
    Kong, Soonho
    Xing, Eric P.
    [J]. BIOINFORMATICS, 2016, 32 (12) : 164 - 173