A Comprehensive Analysis of Artificial Intelligence Techniques for the Prediction and Prognosis of Genetic Disorders Using Various Gene Disorders

被引:20
作者
Chaplot, Neelam [1 ]
Pandey, Dhiraj [2 ]
Kumar, Yogesh [3 ]
Sisodia, Pushpendra Singh [4 ]
机构
[1] Manipal Univ Jaipur, Dept Comp Sci & Engn, Jaipur, Rajasthan, India
[2] JSS Acad Tech Educ, Dept Comp Sci & Engn, Noida, UP, India
[3] Pandit Deendayal Energy Univ, Sch Technol, Dept CSE, Gandhinagar, Gujarat, India
[4] Indus Univ, Dept Comp Engn, IITE, Ahmadabad, Gujarat, India
关键词
Adaptive boosting - Classification (of information) - Diagnosis - Diseases - Genes - Learning systems - Logistic regression - Mean square error - Random forests - Support vector regression;
D O I
10.1007/s11831-023-09904-1
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
A medical analysis of diagnosing rare genetic diseases has rapidly become the most expensive and time-consuming component for doctors. By combining predictive methods with growing knowledge of genetic disease, artificial intelligence (AI) has the potential to simplify and accelerate genome interpretation greatly. In this paper, multiple machine-learning models like support vector machine, Gaussian Naive Bayes, KNN, Decision Tree, Gradient Boosting, logistic regression, light gradient boosting classifier, Random Forest, extreme gradient boosting classifier, and cat-boost are applied to the genetic disorder as well as genetic disorder sub-classes datasets. The dataset has been initially pre-processed to check for NAN values, which are graphically represented in various categories like genetic disorder, genetic disorder subclasses, five samples of symptoms, genes inherited from mother's and father's side, birth defects etc. to study their pattern. Later, the features have been selected using standardization technique on which the machine learning models are applied and later evaluated using accuracy, loss, recall, precision, root mean square error, and F1 score. Furthermore, the confusion matrix is also generated to compute false negative, true positive, false positive and true negative values for the classes drawn from both datasets. It has been found that the highest accuracy has been calculated by decision tree, random forest, gradient boosting, LGBM classifier, XGB classifier, and CatBoost by 99.9% for genetic disorder while as only the random forest, decision tree, LGBM classifier, and CatBoost, on the other hand, achieved 99.9% accuracy for genetic disorder sub-classes.
引用
收藏
页码:3301 / 3323
页数:23
相关论文
共 40 条
[31]  
National Gaucher Foundation, GAUCJ OTH RAR DIS DI
[32]   Development and evaluation of a machine learning-based point-of-care screening tool for genetic syndromes in children: a multinational retrospective study [J].
Porras, Antonio R. ;
Rosenbaum, Kenneth ;
Tor-Diez, Carlos ;
Summar, Marshall ;
Linguraru, Marius George .
LANCET DIGITAL HEALTH, 2021, 3 (10) :E635-E643
[33]   ECG Classification for Detecting ECG Arrhythmia Empowered with Deep Learning Approaches [J].
Rahman, Atta-Ur ;
Asif, Rizwana Naz ;
Sultan, Kiran ;
Alsaif, Suleiman Ali ;
Abbas, Sagheer ;
Khan, Muhammad Adnan ;
Mosavi, Amir .
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
[34]   An Interpretable Approach for Lung Cancer Prediction and Subtype Classification using Gene Expression [J].
Ramos, Bernardo ;
Pereira, Tania ;
Moranguinho, Joao ;
Morgado, Joana ;
Costa, Jose Luis ;
Oliveira, Helder P. .
2021 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC), 2021, :1707-1710
[35]   Improved Machine Learning-Based Predictive Models for Breast Cancer Diagnosis [J].
Rasool, Abdur ;
Bunterngchit, Chayut ;
Tiejian, Luo ;
Islam, Md Ruhul ;
Qu, Qiang ;
Jiang, Qingshan .
INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2022, 19 (06)
[36]   The use of machine learning in rare diseases: a scoping review [J].
Schaefer, Julia ;
Lehne, Moritz ;
Schepers, Josef ;
Prasser, Fabian ;
Thun, Sylvia .
ORPHANET JOURNAL OF RARE DISEASES, 2020, 15 (01)
[37]   A tutorial on support vector regression [J].
Smola, AJ ;
Schölkopf, B .
STATISTICS AND COMPUTING, 2004, 14 (03) :199-222
[38]  
Sundaram RB, 2021, ANALYTICSVIDHYA
[39]  
Tasin Tasmina, 2022, Proceedings of the International Conference on Big Data, IoT, and Machine Learning: BIM 2021. Lecture Notes on Data Engineering and Communications Technologies (95), P27, DOI 10.1007/978-981-16-6636-0_3
[40]   Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction [J].
Uddin, Shahadat ;
Haque, Ibtisham ;
Lu, Haohui ;
Moni, Mohammad Ali ;
Gide, Ergun .
SCIENTIFIC REPORTS, 2022, 12 (01)