Large-Scale Malicious Software Classification With Fuzzified Features and Boosted Fuzzy Random Forest

被引:7
作者
Li, Fang-Qi [1 ]
Wang, Shi-Lin [1 ]
Liew, Alan Wee-Chung [2 ]
Ding, Weiping [3 ]
Liu, Gong-Shen [1 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai 200240, Peoples R China
[2] Griffith Univ, Sch Informat & Commun Technol, Gold Coast, Qld 4222, Australia
[3] Nantong Univ, Sch Informat Sci & Technol, Nantong 226019, Peoples R China
基金
中国国家自然科学基金;
关键词
Malware; Feature extraction; Machine learning; Decision trees; Forestry; Support vector machines; Boosted random forest; computer security; fuzzy decision tree; malware classification; MACHINE; SYSTEM;
D O I
10.1109/TFUZZ.2020.3016023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification of malicious software, especially in a very large dataset, is a challenging task for machine intelligence. Malware can have highly diversified features, each of which has highly heterogeneous distributions. These factors increase the difficulties for traditional data analytic approaches to deal with them. Although deep learning based methods have reported good classification performance, the deep models usually lack interpretability and are fragile under adversarial attacks. To solve these problems, fuzzy systems have become a competitive candidate in malware analysis. In this article, a new fuzzy-based approach is proposed for malware classification. We focused on portable executable files in the Windows platform and analyzed the distributions of static features and content-oriented features. Fuzzification was used to reduce the ubiquitous impact of noise and outliers in a very large dataset. Finally, a novel boosted classifier consisted of fuzzy decision trees and support vector machine is proposed to perform the malware classification. By using fuzzy decision trees, the inner structure of the classifier can be readily interpreted as discriminative rules, whereas the novel boosting strategy provides state-of-the-art classification performance. Extensive experimental results showed that our method significantly outperformed several state-of-the-art classifiers.
引用
收藏
页码:3205 / 3218
页数:14
相关论文
共 56 条
  • [1] Abdessadki I., 2019, International Journal of Computer Network and Information Security(IJCNIS), V11, P1, DOI DOI 10.5815/IJCNIS.2019.06.01
  • [2] Novel Feature Extraction, Selection and Fusion for Effective Malware Family Classification
    Ahmadi, Mansour
    Ulyanov, Dmitry
    Semenov, Stanislav
    Trofimov, Mikhail
    Giacinto, Giorgio
    [J]. CODASPY'16: PROCEEDINGS OF THE SIXTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY, 2016, : 183 - 194
  • [3] Adversarial Deep Learning for Robust Detection of Binary Encoded Malware
    Al-Dujaili, Abdullah
    Huang, Alex
    Hemberg, Erik
    O'reilly, Una-May
    [J]. 2018 IEEE SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (SPW 2018), 2018, : 76 - 82
  • [4] Athiwaratkun B, 2017, INT CONF ACOUST SPEE, P2482, DOI 10.1109/ICASSP.2017.7952603
  • [5] Bekerman D, 2015, IEEE CONF COMM NETW, P134, DOI 10.1109/CNS.2015.7346821
  • [6] Bhargava N., 2013, INT J ADV RES COMPUT, V3, P1114
  • [7] A fuzzy random forest
    Bonissone, Piero
    Cadenas, Jose M.
    Carmen Garrido, M.
    Andres Diaz-Valladares, R.
    [J]. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2010, 51 (07) : 729 - 747
  • [8] Malware classification using self organising feature maps and machine activity data
    Burnap, Pete
    French, Richard
    Turner, Frederick
    Jones, Kevin
    [J]. COMPUTERS & SECURITY, 2018, 73 : 399 - 410
  • [9] Fuzzy Restricted Boltzmann Machine for the Enhancement of Deep Learning
    Chen, C. L. Philip
    Zhang, Chun-Yang
    Chen, Long
    Gan, Min
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2015, 23 (06) : 2163 - 2173
  • [10] Visualization Feature and CNN Based Homology Classification of Malicious Code
    Chu, Qianfeng
    Liu, Gongshen
    Zhu, Xinyu
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2020, 29 (01) : 154 - 160