Integrating Information Gain and Chi-Square for Enhanced Malware Detection Performance

被引:0
|
作者
Rafrastara, Fauzi Adi [1 ]
Ghozi, Wildanil [1 ]
Sani, Ramadhan Rakhmat [1 ]
Handoko, Lekso Budi [1 ]
Abdussalam [1 ]
Pramudya, Elkaf Rahmawan [1 ]
Abdollah, Faizal M. [2 ]
机构
[1] Univ Dian Nuswantoro, Fac Comp Sci, Semarang, Indonesia
[2] Univ Teknikal Malaysia Melaka, Fak Teknol Maklumat Dan Komunikasi, Melaka, Malaysia
来源
JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY-MALAYSIA | 2025年 / 24卷 / 01期
关键词
Malware detection; IGCS; feature selection; Information Gain; Chi-Square; CLASSIFICATION;
D O I
10.32890/jict2025.24.1.4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Malware represents a serious and continuously evolving threat in the modern digital environment. Detecting malware is essential to safeguard devices and systems from risks such as data corruption, data theft, account compromises, and unauthorized access that could result in total system takeover. As malware has progressed from its simpler, monomorphic variants to more sophisticated forms like oligomorphic, polymorphic, and metamorphic, a machine learning-based detection system is now required, surpassing the limitations of traditional signature-based methods. Recent studies have shown that this challenge can be addressed by employing machine learning algorithms for detection. Some studies have also implemented various feature selection methods to optimize detection efficiency. However, they continue to struggle with false positives and false negatives, striving to reach zero tolerance in malware detection. This study introduces the IGCS method, a combined feature selection approach that integrates Information Gain with Chi-Square (X2) to enhance both the effectiveness and efficiency of machine learning classifiers. Using IGCS, six classifiers-Random Forest, XGBoost, kNN, Decision Tree, Logistic Regression, and Na & iuml;ve Bayes-achieved higher performance scores compared to other scenarios, such as when classifiers were combined with Information Gain, Chi- Square, PCA, or even without any feature selection. As a result, Random Forest with 30 features selected by IGCS proved superior to any combination of classifiers and feature selection methods in malware detection, achieving 99.0% accuracy, recall, precision, and F1-Score. This combination also demonstrated efficiency with a 52.5% decrease in training time and a 56.9% decrease in testing time.
引用
收藏
页码:79 / 101
页数:23
相关论文
共 24 条
  • [1] Malware Detection Using Semantic Features and Improved Chi-square
    Ha, Seung-Tae
    Hong, Sung-Sam
    Han, Myung-Mook
    JOURNAL OF INTERNET TECHNOLOGY, 2018, 19 (03): : 879 - 887
  • [2] Properties of chi-square statistic and information gain for feature selection of imbalanced text data
    Mun, Hye In
    Son, Won
    KOREAN JOURNAL OF APPLIED STATISTICS, 2022, 35 (04) : 469 - 484
  • [3] Distributed Detection System Using Wavelet Decomposition and Chi-Square Test
    Ouerfelli, Fatima Ezzahra
    Barbaria, Khaled
    Zouari, Belhassen
    Fachkha, Claude
    RISKS AND SECURITY OF INTERNET AND SYSTEMS (CRISIS 2019), 2020, 12026 : 365 - 377
  • [4] A Flow based Anomaly Detection System using Chi-square Technique
    Muraleedharan, N.
    Parmar, Arun
    Kumar, Manish
    2010 IEEE 2ND INTERNATIONAL ADVANCE COMPUTING CONFERENCE, 2010, : 285 - 289
  • [5] An Sql Injection Detection Model Using Chi-Square with Classification Techniques
    Adebiyi, Marion Olubunmi
    Arowolo, Micheal Olaolu
    Archibong, Goodnews Ime
    Mshelia, Moses Damilola
    Adebiyi, Ayodele Ariyo
    INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND ENERGY TECHNOLOGIES (ICECET 2021), 2021, : 289 - 296
  • [6] Incremental Attribute Reduction Method Based on Chi-Square Statistics and Information Entropy
    Su, Na
    An, Xinjun
    Yan, Changqing
    Ji, Shujuan
    IEEE ACCESS, 2020, 8 : 98234 - 98243
  • [7] An Improved Ensemble-Based Cardiovascular Disease Detection System with Chi-Square Feature Selection
    Korial, Ayad E.
    Gorial, Ivan Isho
    Humaidi, Amjad J.
    COMPUTERS, 2024, 13 (06)
  • [8] Chi-Square and PCA Based Feature Selection for Diabetes Detection with Ensemble Classifier
    Rupapara, Vaibhav
    Rustam, Furqan
    Ishaq, Abid
    Lee, Ernesto
    Ashraf, Imran
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 36 (02) : 1931 - 1949
  • [9] SCADA intrusion detection scheme exploiting the fusion of modified decision tree and Chi-square feature selection
    Ahakonye, Love Allen Chijioke
    Nwakanma, Cosmas Ifeanyi
    Lee, Jae-Min
    Kim, Dong-Seong
    INTERNET OF THINGS, 2023, 21
  • [10] Fusion of Chi-Square and Z-Test Statistics for Feature Selection with Machine Learning Techniques in Intrusion Detection
    Sharma, Amrendra Kumar
    Tiwari, Mamta
    ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2023, PT I, 2024, 2090 : 206 - 224