Using expression quantitative trait loci data and graph-embedded neural networks to uncover genotype-phenotype interactions

被引：1

作者：

Guo, Xinpeng ^{[1
,2
]}

Han, Jinyu ^{[3
]}

Song, Yafei ^{[2
]}

Yin, Zhilei ^{[1
]}

Liu, Shuaichen ^{[4
]}

Shang, Xuequn ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Sch Comp Sci & Engn, Xian, Peoples R China

[2] Air Force Engn Univ, Sch Air & Missile Def, Xian, Peoples R China

[3] Changan Univ, Sch Econ & Management, Xian, Peoples R China

[4] Northwestern Polytech Univ, Sch Marine Sci & Technol, Xian, Peoples R China

来源：

FRONTIERS IN GENETICS | 2022年 / 13卷

基金：

中国国家自然科学基金;

关键词：

eQTL; expression quantitative trait loci; graph-embedded deep neural network; genotype-phenotype; SNP; gene; INTEGRATION; GWAS;

D O I：

10.3389/fgene.2022.921775

中图分类号：

Q3 [遗传学];

学科分类号：

071007 ; 090102 ;

摘要：

Motivation: A central goal of current biology is to establish a complete functional link between the genotype and phenotype, known as the so-called genotype-phenotype map. With the continuous development of high-throughput technology and the decline in sequencing costs, multi-omics analysis has become more widely employed. While this gives us new opportunities to uncover the correlation mechanisms between single-nucleotide polymorphism (SNP), genes, and phenotypes, multi-omics still faces certain challenges, specifically: 1) When the sample size is large enough, the number of omics types is often not large enough to meet the requirements of multi-omics analysis; 2) each omics' internal correlations are often unclear, such as the correlation between genes in genomics; 3) when analyzing a large number of traits (p), the sample size (n) is often smaller than p, n << p, hindering the application of machine learning methods in the classification of disease outcomes. Results: To solve these issues with multi-omics and build a robust classification model, we propose a graph-embedded deep neural network (G-EDNN) based on expression quantitative trait loci (eQTL) data, which achieves sparse connectivity between network layers to prevent overfitting. The correlation within each omics is also considered such that the model more closely resembles biological reality. To verify the capabilities of this method, we conducted experimental analysis using the GSE28127 and GSE95496 data sets from the Gene Expression Omnibus (GEO) database, tested various neural network architectures, and used prior data for feature selection and graph embedding. Results show that the proposed method could achieve a high classification accuracy and easy-to-interpret feature selection. This method represents an extended application of genotype-phenotype association analysis in deep learning networks.

引用

页数：10

共 45 条

[11] Guo X., 2020, IPMM: Cancer subtype clustering model based on multiomics data and pathway and motif information, P560
[12] Linking genotype to phenotype in multi-omics data of small sample
Guo, Xinpeng
Song, Yafei
Liu, Shuhui
Gao, Meihong
Qi, Yang
Shang, Xuequn
[J]. BMC GENOMICS, 2021, 22 (01)
[13] A unified framework for the integration of multiple hierarchical clusterings or networks from multi-source data
Hulot, Audrey
Laloe, Denis
Jaffrezic, Florence
[J]. BMC BIOINFORMATICS, 2021, 22 (01)
[14] Approaches to Integrating Metabolomics and Multi-Omics Data: A Primer
Jendoubi, Takoua
[J]. METABOLITES, 2021, 11 (03)
[15] Understanding Genotype-Phenotype Effects in Cancer via Network Approaches
Kim, Yoo-Ah
Cho, Dong-Yeon
Przytycka, Teresa M.
[J]. PLOS COMPUTATIONAL BIOLOGY, 2016, 12 (03)
[16] Kingma DP, 2014, ADV NEUR IN, V27
[17] Kolen J. F., 2001, FIELD GUIDE DYNAMICA
[18] A graph-embedded deep feedforward network for disease outcome classification and feature selection using gene expression data
Kong, Yunchuan
Yu, Tianwei
[J]. BIOINFORMATICS, 2018, 34 (21) : 3727 - 3737
[19] Predictive Genes in Adjacent Normal Tissue Are Preferentially Altered by sCNV during Tumorigenesis in Liver Cancer and May Rate Limiting
Lamb, John R.
Zhang, Chunsheng
Xie, Tao
Wang, Kai
Zhang, Bin
Hao, Ke
Chudin, Eugene
Fraser, Hunter B.
Millstein, Joshua
Ferguson, Mark
Suver, Christine
Ivanovska, Irena
Scott, Martin
Philippar, Ulrike
Bansal, Dimple
Zhang, Zhan
Burchard, Julja
Smith, Ryan
Greenawalt, Danielle
Cleary, Michele
Derry, Jonathan
Loboda, Andrey
Watters, James
Poon, Ronnie T. P.
Fan, Sheung T.
Yeung, Chun
Lee, Nikki P. Y.
Guinney, Justin
Molony, Cliona
Emilsson, Valur
Buser-Doepner, Carolyn
Zhu, Jun
Friend, Stephen
Mao, Mao
Shaw, Peter M.
Dai, Hongyue
Luk, John M.
Schadt, Eric E.
[J]. PLOS ONE, 2011, 6 (07):
[20] A network-driven approach for genome-wide association mapping
Lee, Seunghak
Kong, Soonho
Xing, Eric P.
[J]. BIOINFORMATICS, 2016, 32 (12) : 164 - 173

← 1 2 3 4 5 →