Windows malware detection based on static analysis with multiple features

被引:3
作者
Yousuf, Muhammad Irfan [1 ]
Anwer, Izza [2 ]
Riasat, Ayesha [3 ]
Zia, Khawaja Tahir [1 ]
Kim, Suhyun [4 ]
机构
[1] Univ Engn & Technol Lahore, Dept Comp Sci, Lahore, Pakistan
[2] Univ Engn & Technol Lahore, Dept Transportat Engn & Management, Lahore, Pakistan
[3] Univ Engn & Technol Lahore, Dept Basic Sci & Humanities, Lahore, Pakistan
[4] Korea Inst Sci & Technol, Ctr Artificial Intelligence, Seoul, South Korea
关键词
Static malware analysis; Windows PE; Machine learning; Multiple features;
D O I
10.7717/peerj-cs.1319
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Malware or malicious software is an intrusive software that infects or performs harmful activities on a computer under attack. Malware has been a threat to individuals and organizations since the dawn of computers and the research community has been struggling to develop efficient methods to detect malware. In this work, we present a static malware detection system to detect Portable Executable (PE) malware in Windows environment and classify them as benign or malware with high accuracy. First, we collect a total of 27,920 Windows PE malware samples divided into six categories and create a new dataset by extracting four types of information including the list of imported DLLs and API functions called by these samples, values of 52 attributes from PE Header and 100 attributes of PE Section. We also amalgamate this information to create two integrated feature sets. Second, we apply seven machine learning models; gradient boosting, decision tree, random forest, support vector machine, K-nearest neighbor, naive Bayes, and nearest centroid, and three ensemble learning techniques including Majority Voting, Stack Generalization, and AdaBoost to classify the malware. Third, to further improve the performance of our malware detection system, we also deploy two dimensionality reduction techniques: Information Gain and Principal Component Analysis. We perform a number of experiments to test the performance and robustness of our system on both raw and selected features and show its supremacy over previous studies. By combining machine learning, ensemble learning and dimensionality reduction techniques, we construct a static malware detection system which achieves a detection rate of 99.5% and error rate of only 0.47%.
引用
收藏
页数:29
相关论文
共 30 条
  • [1] ZeVigilante: Detecting Zero-Day Malware Using Machine Learning and Sandboxing Analysis Techniques
    Alhaidari, Fahd
    Shaib, Nouran Abu
    Alsafi, Maram
    Alharbi, Haneen
    Alawami, Majd
    Aljindan, Reem
    Rahman, Atta-ur
    Zagrouba, Rachid
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [2] A dynamic Windows malware detection and prediction method based on contextual understanding of API call sequence
    Amer, Eslam
    Zelinka, Ivan
    [J]. COMPUTERS & SECURITY, 2020, 92
  • [3] Azmee AA, 2020, INT J ADV COMPUT SC, V11
  • [4] Comparing Deep Learning and Shallow Learning Techniques for API Calls Malware Prediction: A Study
    Cannarile, Angelo
    Dentamaro, Vincenzo
    Galantucci, Stefano
    Iannacone, Andrea
    Impedovo, Donato
    Pirlo, Giuseppe
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (03):
  • [5] Deep learning based Sequential model for malware analysis using Windows exe API Calls
    Catak, Ferhat Ozgur
    Yaz, Ahmet Faruk
    Elezaj, Ogerta
    Ahmed, Javed
    [J]. PEERJ COMPUTER SCIENCE, 2020,
  • [6] Feature Selection and Improving Classification Performance for Malware Detection
    Chia Tien Dan Lo
    Pablo, Ordonez
    Carlos, Cepeda
    [J]. PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCES ON BIG DATA AND CLOUD COMPUTING (BDCLOUD 2016) SOCIAL COMPUTING AND NETWORKING (SOCIALCOM 2016) SUSTAINABLE COMPUTING AND COMMUNICATIONS (SUSTAINCOM 2016) (BDCLOUD-SOCIALCOM-SUSTAINCOM 2016), 2016, : 560 - 566
  • [7] Chowdhury M, 2017, C IND ELECT APPL, P1691, DOI 10.1109/ICIEA.2017.8283111
  • [8] Ensemble-Based Classification Using Neural Networks and Machine Learning Models for Windows PE Malware Detection
    Damasevicius, Robertas
    Venckauskas, Algimantas
    Toldinas, Jevgenijus
    Grigaliunas, Sarunas
    [J]. ELECTRONICS, 2021, 10 (04) : 1 - 23
  • [9] Comparative Analysis of Low-Dimensional Features and Tree-Based Ensembles for Malware Detection Systems
    Euh, Seoungyul
    Lee, Hyunjong
    Kim, Donghoon
    Hwang, Doosung
    [J]. IEEE ACCESS, 2020, 8 : 76796 - 76808
  • [10] The rise of machine learning for detection and classification of malware: Research developments, trends and challenges
    Gibert, Daniel
    Mateu, Carles
    Planes, Jordi
    [J]. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2020, 153 (153)