Using expression quantitative trait loci data and graph-embedded neural networks to uncover genotype-phenotype interactions

被引：1

作者：

Guo, Xinpeng ^{[1
,2
]}

Han, Jinyu ^{[3
]}

Song, Yafei ^{[2
]}

Yin, Zhilei ^{[1
]}

Liu, Shuaichen ^{[4
]}

Shang, Xuequn ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Sch Comp Sci & Engn, Xian, Peoples R China

[2] Air Force Engn Univ, Sch Air & Missile Def, Xian, Peoples R China

[3] Changan Univ, Sch Econ & Management, Xian, Peoples R China

[4] Northwestern Polytech Univ, Sch Marine Sci & Technol, Xian, Peoples R China

来源：

FRONTIERS IN GENETICS | 2022年 / 13卷

基金：

中国国家自然科学基金;

关键词：

eQTL; expression quantitative trait loci; graph-embedded deep neural network; genotype-phenotype; SNP; gene; INTEGRATION; GWAS;

D O I：

10.3389/fgene.2022.921775

中图分类号：

Q3 [遗传学];

学科分类号：

071007 ; 090102 ;

摘要：

Motivation: A central goal of current biology is to establish a complete functional link between the genotype and phenotype, known as the so-called genotype-phenotype map. With the continuous development of high-throughput technology and the decline in sequencing costs, multi-omics analysis has become more widely employed. While this gives us new opportunities to uncover the correlation mechanisms between single-nucleotide polymorphism (SNP), genes, and phenotypes, multi-omics still faces certain challenges, specifically: 1) When the sample size is large enough, the number of omics types is often not large enough to meet the requirements of multi-omics analysis; 2) each omics' internal correlations are often unclear, such as the correlation between genes in genomics; 3) when analyzing a large number of traits (p), the sample size (n) is often smaller than p, n << p, hindering the application of machine learning methods in the classification of disease outcomes. Results: To solve these issues with multi-omics and build a robust classification model, we propose a graph-embedded deep neural network (G-EDNN) based on expression quantitative trait loci (eQTL) data, which achieves sparse connectivity between network layers to prevent overfitting. The correlation within each omics is also considered such that the model more closely resembles biological reality. To verify the capabilities of this method, we conducted experimental analysis using the GSE28127 and GSE95496 data sets from the Gene Expression Omnibus (GEO) database, tested various neural network architectures, and used prior data for feature selection and graph embedding. Results show that the proposed method could achieve a high classification accuracy and easy-to-interpret feature selection. This method represents an extended application of genotype-phenotype association analysis in deep learning networks.

引用

页数：10

共 45 条

[1] Discovery and Opportunities With Integrative Analytics Using Multiple-Omics Data
Athreya, Arjun P.
Lazaridis, Konstantinos N.
[J]. HEPATOLOGY, 2021, 74 (02) : 1081 - 1087
[2] Wavelet Screening: a novel approach to analyzing GWAS data
Denault, William R. P.
Gjessing, Hakon K.
Juodakis, Julius
Jacobsson, Bo
Jugessur, Astanand
[J]. BMC BIOINFORMATICS, 2021, 22 (01)
[3] Network-based integration of multi-omics data for prioritizing cancer genes
Dimitrakopoulos, Christos
Hindupur, Sravanth Kumar
Haefliger, Luca
Behr, Jonas
Montazeri, Hesam
Hall, Michael N.
Beerenwinkel, Niko
[J]. BIOINFORMATICS, 2018, 34 (14) : 2441 - 2448
[4] Evaluation and comparison of multi-omics data integration methods for cancer subtyping
Duan, Ran
Gao, Lin
Gao, Yong
Hu, Yuxuan
Xu, Han
Huang, Mingfeng
Song, Kuo
Wang, Hongda
Dong, Yongqiang
Jiang, Chaoqun
Zhang, Chenxing
Jia, Songwei
[J]. PLOS COMPUTATIONAL BIOLOGY, 2021, 17 (08)
[5] Gene Expression Omnibus: NCBI gene expression and hybridization array data repository
Edgar, R
Domrachev, M
Lash, AE
[J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (01) : 207 - 210
[6] simGWAS: a fast method for simulation of large scale case-control GWAS summary statistics
Fortune, Mary D.
Wallace, Chris
[J]. BIOINFORMATICS, 2019, 35 (11) : 1901 - 1906
[7] E-MAGMA: an eQTL-informed method to identify risk genes using genome-wide association study summary statistics
Gerring, Zachary F.
Mina-Vargas, Angela
Gamazon, Eric R.
Derks, Eske M.
[J]. BIOINFORMATICS, 2021, 37 (16) : 2245 - 2249
[8] PICKLE 2.0: A human protein-protein interaction meta-database employing data integration via genetic information ontology
Gioutlakis, Aris
Klapa, Maria I.
Moschonas, Nicholas K.
[J]. PLOS ONE, 2017, 12 (10):
[9] Goodfellow I, 2016, ADAPT COMPUT MACH LE, P1
[10] Integrative omics of schizophrenia: from genetic determinants to clinical classification and risk prediction
Guan, Fanglin
Ni, Tong
Zhu, Weili
Williams, L. Keoki
Cui, Long-Biao
Li, Ming
Tubbs, Justin
Sham, Pak-Chung
Gui, Hongsheng
[J]. MOLECULAR PSYCHIATRY, 2022, 27 (01) : 113 - 126

← 1 2 3 4 5 →