Enhancing Biogeographical Ancestry Prediction with Deep Learning: A Long Short-Term Memory Approach

被引:0
作者
Almansour, Fadwa [1 ]
Alshammari, Abdulaziz [1 ]
Alqahtani, Fahad [2 ]
机构
[1] Imam Mohammad Ibn Saud Islamic Univ IMSIU, Riyadh, Saudi Arabia
[2] King Abdulaziz City Sci & Technol, Natl Ctr Genom & Bioinformat, Riyadh, Saudi Arabia
来源
FORTHCOMING NETWORKS AND SUSTAINABILITY IN THE AIOT ERA, VOL 2, FONES-AIOT 2024 | 2024年 / 1036卷
关键词
biogeographical ancestry; machine learning; single nucleotide polymorphisms; Bioinformatics; ADMIXTURE; PANEL;
D O I
10.1007/978-3-031-62881-8_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, human biogeographical ancestry prediction plays an important role in many domains, such as the forensic domain, to detect missing or suspected people. Despite the advantage and capability of these deep learning models, there were limited investigations on identifying human biogeographical ancestry using deep learning approaches. In this research, we propose to predict biogeographical ancestry using a deep learning approach to distinguish between seven populations (Africans, Europeans, Central-South Asians, Middle-East Asians, East Asians, Native Americans, and Oceanians). We used the Long Short-Term Memory (LSTM) approach to enhance the overall current accuracy models, especially for populations that have gene similarity such as (Europeans, Middle-East Asians, and Central-South Asians). We employed a stratified K-fold cross-validation technique to prevent overfitting and ensure an equal distribution of samples for each fold. The results showed that our model outperformed the existing deep learning algorithm Convolutional Neural Network (CNN), by achieving an overall accuracy of 90.88.
引用
收藏
页码:64 / 82
页数:19
相关论文
共 38 条
[1]   Assessment of the Precision ID Ancestry panel [J].
Al-Asfi, Muna ;
McNevin, Dennis ;
Mehta, Bhavik ;
Power, Daniel ;
Gahan, Michelle E. ;
Daniel, Runa .
INTERNATIONAL JOURNAL OF LEGAL MEDICINE, 2018, 132 (06) :1581-1594
[2]   A Deep Learning Approach to Population Structure Inference in Inbred Lines of Maize [J].
Alejandra Lopez-Cortes, Xaviera ;
Matamala, Felipe ;
Maldonado, Carlos ;
Mora-Poblete, Freddy ;
Scapim, Carlos Alberto .
FRONTIERS IN GENETICS, 2020, 11
[3]   Multivariate statistical approach and machine learning for the evaluation of biogeographical ancestry inference in the forensic field [J].
Alladio, Eugenio ;
Poggiali, Brando ;
Cosenza, Giulia ;
Pilli, Elena .
SCIENTIFIC REPORTS, 2022, 12 (01)
[4]   A Hybrid Supervised Approach to Human Population Identification Using Genomics Data [J].
Araghi, Sahar ;
Nguyen, Thanh .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2021, 18 (02) :443-454
[5]  
Ba J L., LAYER NORMALIZATION
[6]  
Bjorck J, 2018, ADV NEUR IN, V31
[7]   Inference of biogeographical ancestry across central regions of Eurasia [J].
Bulbul, O. ;
Filoglu, G. ;
Zorlu, T. ;
Altuncul, H. ;
Freire-Aradas, A. ;
Soechtig, J. ;
Ruiz, Y. ;
Klintschar, M. ;
Triki-Fendri, S. ;
Rebai, A. ;
Phillips, C. ;
Lareu, M. V. ;
Carracedo, A. ;
Schneider, P. M. .
INTERNATIONAL JOURNAL OF LEGAL MEDICINE, 2016, 130 (01) :73-79
[8]   Development of a SNP panel for predicting biogeographical ancestry and phenotype using massively parallel sequencing [J].
Bulbul, Ozlem ;
Filoglu, Gonul .
ELECTROPHORESIS, 2018, 39 (21) :2743-2751
[9]   ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network [J].
Cao, Renzhi ;
Freitas, Colton ;
Chan, Leong ;
Sun, Miao ;
Jiang, Haiqing ;
Chen, Zhangxin .
MOLECULES, 2017, 22 (10)
[10]   Opinion - The Human Genome Diversity Project: past, present and future [J].
Cavalli-Sforza, LL .
NATURE REVIEWS GENETICS, 2005, 6 (04) :333-340