Performance Comparison of Feature Selection Methods for Prediction in Medical Data

被引:3
作者
Khalid, Nur Hidayah Mohd [1 ]
Ismail, Amelia Ritahani [1 ]
Aziz, Normaziah Abdul [1 ]
Hussin, Amir Aatieff Amir [1 ]
机构
[1] Int Islamic Univ Malaysia, Dept Comp Sci, Kulliyyah Informat & Commun Technol, POB 10, Kuala Lumpur 50728, Malaysia
来源
SOFT COMPUTING IN DATA SCIENCE, SCDS 2023 | 2023年 / 1771卷
关键词
CatBoost; Feature selection; RFE; Lasso;
D O I
10.1007/978-981-99-0405-1_7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Along with technological advancement, the application of machine learning algorithms in industry, notably in the medical field, has grown and progressed quickly. Medical databases commonly contain a lot of information about themedical histories of the patients and patient's conditions, in addition, it is challenging to identify and extract the information that will be relevant and meaningful for machine learning modelling. Not to mention, the efficacy of the predictive machine learning algorithm can be enhanced by using only useful and pertinent information. Hence, feature selection is proposed to determine the significant features. Thus, feature selection should be fully utilized and applied when building machine learning algorithm. This study analyzes filter, wrapper, and embedded feature selection methods for medical data with the predictive machine learning algorithm, Random Forest and CatBoost. The experiment is carried out by evaluating the performances of the machine learning with and without applying feature selection methods. According to the results, CatBoost with RFE shows the best performance, in comparison to Random Forest with other feature selection methods.
引用
收藏
页码:92 / 106
页数:15
相关论文
共 45 条
[1]  
Aggrawal R., 2020, SN Comput. Sci., V1, P1, DOI [10.1007/s42979-020-00370-1, DOI 10.1007/S42979-020-00370-1]
[2]   Comparative Study of Optimum Medical Diagnosis of Human Heart Disease Using Machine Learning Technique With and Without Sequential Feature Selection [J].
Ahmad, Ghulab Nabi ;
Shafiullah ;
Algethami, Abdullah ;
Fatima, Hira ;
Akhter, Syed Md Humayun .
IEEE ACCESS, 2022, 10 :23808-23828
[3]   Development of quantitative model of a local lymph node assay for evaluating skin sensitization potency applying machine learning CatBoost [J].
Ambe, Kaori ;
Suzuki, Masaharu ;
Ashikaga, Takao ;
Tohkin, Masahiro .
REGULATORY TOXICOLOGY AND PHARMACOLOGY, 2021, 125
[4]   Automated detection of heart valve disorders with time-frequency and deep features on PCG signals [J].
Arslan, Ozkan .
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2022, 78
[5]   A Comparative Performance Evaluation of Supervised Feature Selection Algorithms on Microarray Datasets [J].
ArunKumar, C. ;
Sooraj, M. P. ;
Ramakrishnan, S. .
7TH INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING & COMMUNICATIONS (ICACC-2017), 2017, 115 :209-217
[6]  
Aziz R, 2017, AIMS Bioeng, V4, P179, DOI DOI 10.3934/BIOENG.2017.1.179
[7]   A Review of the Application of Information Theory to Clinical Diagnostic Testing [J].
Benish, William A. .
ENTROPY, 2020, 22 (01) :97
[8]  
Cava William La, 2019, AMIA Annu Symp Proc, V2019, P572
[9]   Parkinson's Disease in Women and Men: What's the Difference? [J].
Cerri, Silvia ;
Mus, Liudmila ;
Blandini, Fabio .
JOURNAL OF PARKINSONS DISEASE, 2019, 9 (03) :501-515
[10]   Ensemble feature selection in medical datasets: Combining filter, wrapper, and embedded feature selection results [J].
Chen, Chih-Wen ;
Tsai, Yi-Hong ;
Chang, Fang-Rong ;
Lin, Wei-Chao .
EXPERT SYSTEMS, 2020, 37 (05)