Integrating Information Gain and Chi-Square for Enhanced Malware Detection Performance

被引:0
作者
Rafrastara, Fauzi Adi [1 ]
Ghozi, Wildanil [1 ]
Sani, Ramadhan Rakhmat [1 ]
Handoko, Lekso Budi [1 ]
Abdussalam [1 ]
Pramudya, Elkaf Rahmawan [1 ]
Abdollah, Faizal M. [2 ]
机构
[1] Univ Dian Nuswantoro, Fac Comp Sci, Semarang, Indonesia
[2] Univ Teknikal Malaysia Melaka, Fak Teknol Maklumat Dan Komunikasi, Melaka, Malaysia
来源
JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY-MALAYSIA | 2025年 / 24卷 / 01期
关键词
Malware detection; IGCS; feature selection; Information Gain; Chi-Square; CLASSIFICATION;
D O I
10.32890/jict2025.24.1.4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Malware represents a serious and continuously evolving threat in the modern digital environment. Detecting malware is essential to safeguard devices and systems from risks such as data corruption, data theft, account compromises, and unauthorized access that could result in total system takeover. As malware has progressed from its simpler, monomorphic variants to more sophisticated forms like oligomorphic, polymorphic, and metamorphic, a machine learning-based detection system is now required, surpassing the limitations of traditional signature-based methods. Recent studies have shown that this challenge can be addressed by employing machine learning algorithms for detection. Some studies have also implemented various feature selection methods to optimize detection efficiency. However, they continue to struggle with false positives and false negatives, striving to reach zero tolerance in malware detection. This study introduces the IGCS method, a combined feature selection approach that integrates Information Gain with Chi-Square (X2) to enhance both the effectiveness and efficiency of machine learning classifiers. Using IGCS, six classifiers-Random Forest, XGBoost, kNN, Decision Tree, Logistic Regression, and Na & iuml;ve Bayes-achieved higher performance scores compared to other scenarios, such as when classifiers were combined with Information Gain, Chi- Square, PCA, or even without any feature selection. As a result, Random Forest with 30 features selected by IGCS proved superior to any combination of classifiers and feature selection methods in malware detection, achieving 99.0% accuracy, recall, precision, and F1-Score. This combination also demonstrated efficiency with a 52.5% decrease in training time and a 56.9% decrease in testing time.
引用
收藏
页码:79 / 101
页数:23
相关论文
共 24 条
  • [21] Comparing the Performance of FCBF, Chi-Square and Relief-F Filter Feature Selection Algorithms in Educational Data Mining
    Zaffar, Maryam
    Hashmani, Manzoor Ahmed
    Savita, K. S.
    RECENT TRENDS IN DATA SCIENCE AND SOFT COMPUTING, IRICT 2018, 2019, 843 : 151 - 160
  • [22] Risk Assessment Score and Chi-Square Automatic Interaction Detection Algorithm for Hypertension Among Africans: Models From the SIREN Study
    Asowata, Osahon J.
    Okekunle, Akinkunmi Paul
    Akpa, Onoja M.
    Fakunle, Adekunle Gregory
    Akinyemi, Joshua O.
    Komolafe, Morenikeji Adeyoyin
    Sarfo, Fred Stephen
    Akpalu, Albert K.
    Obiako, Reginald
    Wahab, Kolawole W.
    Osaigbovo, Godwin O.
    Owolabi, Lukman F.
    Jenkins, Carolyn M.
    Calys-Tagoe, Benedict Nii Laryea
    Arulogun, Oyedunni Sola
    Ogbole, Godwin I.
    Ogah, Okechukwu Samuel
    Lambert, Appiah T.
    Ibinaiye, Philip Oluleke
    Adebayo, Philip B.
    Singh, Arti
    Adeniyi, Sunday Adebori
    Mensah, Yaw B.
    Laryea, Ruth Y.
    Balogun, Olayemi
    Chukwuonye, Innocent Ijezie
    Akinyemi, Rufus O.
    Ovbiagele, Bruce
    Owolabi, Mayowa Ojo
    HYPERTENSION, 2023, 80 (12) : 2581 - 2590
  • [23] Enhanced QSAR Model Performance by Integrating Structural and Gene Expression Information
    Chen, Qian
    Wu, Leihong
    Liu, Wei
    Xing, Li
    Fan, Xiaohui
    MOLECULES, 2013, 18 (09): : 10789 - 10801
  • [24] Performance Comparison of Training Datasets for System Call-Based Malware Detection with Thread Information
    Kajiwara, Yuki
    Zheng, Junjun
    Mouri, Koichi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2021, E104D (12) : 2173 - 2183