Early detection of Alzheimer?' disease using single nucleotide polymorphisms analysis based on gradient boosting tree

被引:20
作者
Ahmed, Hala [1 ]
Soliman, Hassan [1 ]
Elmogy, Mohammed [1 ]
机构
[1] Mansoura Univ, Fac Comp & Informat, Informat Technol Dept, PO 35516, Mansoura, Egypt
关键词
AD; Alzheimer 's disease; Classification; Prediction; Diagnosis; SNPs; Single nucleotide polymorphisms; GBT; Gradient boosting tree; Boruta feature selection; Information gain feature selection; FEATURE-SELECTION; CANCER CLASSIFICATION; GENE SELECTION; ASSOCIATIONS; REVEALS;
D O I
10.1016/j.compbiomed.2022.105622
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Alzheimer's disease (AD) is a degenerative disorder that attacks nerve cells in the brain. AD leads to memory loss and cognitive & intellectual impairments that can influence social activities and decision-making. The most common type of human genetic variation is single nucleotide polymorphisms (SNPs). SNPs are beneficial markers of complex gene-disease. Many common and serious diseases, such as AD, have associated SNPs. Detection of SNP biomarkers linked with AD could help in the early prediction and diagnosis of this disease. The main objective of this paper is to predict and diagnose AD based on SNPs biomarkers with high classification accuracy in the early stages. One of the most concerning problems is the high number of features. Thus, the paper proposes a comprehensive framework for early AD detection and detecting the most significant genes based on SNPs analysis. Usage of machine learning (ML) techniques to identify new biomarkers of AD is also suggested. In the proposed system, two feature selection techniques are separately checked: the information gain filter and Boruta wrapper. The two feature selection techniques were used to select the most significant genes related to AD in this system. Filter methods measure the relevance of features by their correlation with dependent variables, while wrapper methods measure the usefulness of a subset of features by training a model on it. Gradient boosting tree (GBT) has been applied on all AD genetic data of neuroimaging initiative phase 1 (ADNI-1) and Whole-Genome Sequencing (WGS) datasets by using two feature selection techniques. In the whole-genome approach ADNI-1, results revealed that the GBT learning algorithm scored an overall accuracy of 99.06% in the case of using Boruta feature selection. Using information gain feature selection, the proposed system achieved an average accuracy of 94.87%. The results show that the proposed system is preferable for the early detection of AD. Also, the results revealed that the Boruta wrapper feature selection is superior to the information gain filter technique.
引用
收藏
页数:12
相关论文
共 42 条
[1]   DEVELOPING AN EARLY PREDICTIVE SYSTEM FOR IDENTIFYING GENETIC BIOMARKERS ASSOCIATED TO ALZHEIMER'S DISEASE USING MACHINE LEARNING TECHNIQUES [J].
Abd El Hamid, Marwa Mostafa ;
Mabroukt, Mai S. ;
Omar, Yasser M. K. .
BIOMEDICAL ENGINEERING-APPLICATIONS BASIS COMMUNICATIONS, 2019, 31 (05)
[2]  
Abd El Hamid MM, 2016, CAIRO INT BIOM ENG, P5, DOI 10.1109/CIBEC.2016.7836087
[3]  
[Anonymous], ADNI DATASET
[4]   Support vector machine and principal component analysis for microarray data classification [J].
Astuti, Widi ;
Adiwijaya .
INTERNATIONAL CONFERENCE ON DATA AND INFORMATION SCIENCE (ICODIS), 2018, 971
[5]  
Boutorh A, 2015, 2015 12 INT S PROGRA, P1
[6]   Alzheimer's Disease stage identification using deep learning models [J].
Bringas, Santos ;
Salomon, Sergio ;
Duque, Rafael ;
Lage, Carmen ;
Luis Montana, Jose .
JOURNAL OF BIOMEDICAL INFORMATICS, 2020, 109
[7]   Evaluating Imputation Techniques for Missing Data in ADNI: A Patient Classification Study [J].
Campos, Sergio ;
Pizarro, Luis ;
Valle, Carlos ;
Gray, Katherine R. ;
Rueckert, Daniel ;
Allende, Hector .
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2015, 2015, 9423 :3-10
[8]  
Cesar Jr Roberto M., 2010, PROGR PATTERN RECOGN, P477
[9]   A DNA polymorphism discovery resource for research on human genetic variation [J].
Collins, FS ;
Brooks, LD ;
Chakravarti, A .
GENOME RESEARCH, 1998, 8 (12) :1229-1231
[10]  
Danjuma KJ, 2015, ARXIV PREPRINT ARXIV