A hybrid machine learning approach of fuzzy-rough-k-nearest neighbor, latent semantic analysis, and ranker search for efficient disease diagnosis

被引:4
作者
Jha, Sunil Kumar [1 ]
Marina, Ninoslav [2 ]
Wang, Jinwei [1 ]
Ahmad, Zulfiqar [3 ,4 ]
机构
[1] Nanjing Univ Informat Sci & Technol, Sch Comp & Software, Nanjing 210044, Peoples R China
[2] Univ Informat Sci & Technol St Paul Apostle, Ohrid, North Macedonia
[3] Chinese Acad Sci, Inst Hydrobiol, Wuhan, Peoples R China
[4] Univ Calif Riverside, Dept Environm Sci, Riverside, CA 92521 USA
关键词
Hybrid machine learning; fuzzy nearest neighbor; disease diagnosis prediction; feature generation and selection; INSTANCE SELECTION; CLASSIFICATION; ALGORITHMS; SYSTEM;
D O I
10.3233/JIFS-211820
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning approaches have a valuable contribution in improving competency in automated decision systems. Several machine learning approaches have been developed in the past studies in individual disease diagnosis prediction. The present study aims to develop a hybrid machine learning approach for diagnosis predictions of multiple diseases based on the combination of efficient feature generation, selection, and classification methods. Specifically, the combination of latent semantic analysis, ranker search, and fuzzy-rough-k-nearest neighbor has been proposed and validated in the diagnosis prediction of the primary tumor, post-operative, breast cancer, lymphography, audiology, fertility, immunotherapy, and COVID-19, etc. The performance of the proposed approach is compared with single and other hybrid machine learning approaches in terms of accuracy, analysis time, precision, recall, F-measure, the area under ROC, and the Kappa coefficient. The proposed hybrid approach performs better than single and other hybrid approaches in the diagnosis prediction of each of the selected diseases. Precisely, the suggested approach achieved the maximum recognition accuracy of 99.12% of the primary tumor, 96.45% of breast cancer Wisconsin, 94.44% of cryotherapy, 93.81% of audiology, and significant improvement in the classification accuracy and other evaluation metrics in the recognition of the rest of the selected diseases. Besides, it handles the missing values in the dataset effectively.
引用
收藏
页码:2549 / 2563
页数:15
相关论文
共 52 条
[1]   The COVID-19 pandemic and human fertility Birth trends in response to the pandemic will vary according to socioeconomic conditions [J].
Aassve, A. ;
Cavalli, N. ;
Mencarini, L. ;
Plach, S. ;
Bacci, M. Livi .
SCIENCE, 2020, 369 (6502) :370-371
[2]   Bagging schemes on the presence of class noise in classification [J].
Abellan, Joaquin ;
Masegosa, Andres R. .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (08) :6827-6837
[3]   The Impact of the COVID-19 Pandemic on Cancer Patients [J].
Al-Quteimat, Osama M. ;
Amer, Amer Mustafa .
AMERICAN JOURNAL OF CLINICAL ONCOLOGY-CANCER CLINICAL TRIALS, 2020, 43 (06) :452-455
[4]  
[Anonymous], P 2 EUR C EUR WORK S
[5]  
[Anonymous], 1973, Pattern classification and scene analysis
[6]  
[Anonymous], 2013, UCI Machine Learning Repository
[7]  
[Anonymous], 1991, Rough Set: Theoretical Aspects of Reasoning About Data
[8]   A random forest classifier for lymph diseases [J].
Azar, Ahmad Taher ;
Elshazly, Hanaa Ismail ;
Hassanien, Aboul Ella ;
Elkorany, Abeer Mohamed .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2014, 113 (02) :465-473
[9]  
Bareiss R., 1990, Machine Learning, V3, P112, DOI 10.1016/B978-0-08-051055-2.50009-2
[10]   I Learning sign language machine translation based on elastic net regularization and latent semantic analysis [J].
Boulares, Mehrez ;
Jemni, Mohamed .
ARTIFICIAL INTELLIGENCE REVIEW, 2016, 46 (02) :145-166