Features Engineering to Differentiate between Malware and Legitimate Software

被引:5
作者
Daeef, Ammar Yahya [1 ]
Al-Naji, Ali [2 ,3 ]
Nahar, Ali K. [4 ]
Chahl, Javaan [3 ]
机构
[1] Middle Tech Univ, Tech Inst Adm, Baghdad, Iraq
[2] Middle Tech Univ, Elect Engn Tech Coll, Baghdad, Iraq
[3] Univ South Australia, Sch Engn, Mawson Lakes, SA 5095, Australia
[4] Univ Technol Baghdad, Elect Engn Dept, Baghdad, Iraq
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 03期
关键词
machine learning; Jaccard similarity; malware classification; API call sequence; DYNAMIC-ANALYSIS; SEQUENCE;
D O I
10.3390/app13031972
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Malware is the primary attack vector against the modern enterprise. Therefore, it is crucial for businesses to exclude malware from their computer systems. The most responsive solution to this issue would operate in real time at the edge of the IT system using artificial intelligence. However, a lightweight solution is crucial at the edge because these options are restricted by the lack of available memory and processing power. The best contender to offer such a solution is application programming interface (API) calls. However, creating API call characteristics that offer a high malware detection rate with quick execution is a significant challenge. This work uses visualisation analysis and Jaccard similarity to uncover the hidden patterns produced by different API calls in order to accomplish this goal. This study also compared neural networks which use long sequences of API calls with shallow machine learning classifiers. Three classifiers are used: support vector machine (SVM), k-nearest neighbourhood (KNN), and random forest (RF). The benchmark data set comprises 43,876 examples of API call sequences, divided into two categories: malware and legitimate. The results showed that RF performed similarly to long short-term memory (LSTM) and deep graph convolutional neural networks (DGCNNs). They also suggest the potential for performing inference on edge devices in a real-time setting.
引用
收藏
页数:13
相关论文
共 47 条
[1]   Malware Detection Issues, Challenges, and Future Directions: A Survey [J].
Aboaoja, Faitouri A. ;
Zainal, Anazida ;
Ghaleb, Fuad A. ;
Al-rimy, Bander Ali Saleh ;
Eisa, Taiseer Abdalla Elfadil ;
Elnour, Asma Abbas Hassan .
APPLIED SCIENCES-BASEL, 2022, 12 (17)
[2]   A system call refinement-based enhanced Minimum Redundancy Maximum Relevance method for ransomware early detection [J].
Ahmed, Yahye Abukar ;
Kocer, Baris ;
Huda, Shamsul ;
Al-rimy, Bander Ali Saleh ;
Hassan, Mohammad Mehedi .
JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2020, 167
[3]   Malware Analysis and Detection Using Machine Learning Algorithms [J].
Akhtar, Muhammad Shoaib ;
Feng, Tao .
SYMMETRY-BASEL, 2022, 14 (11)
[4]   Ransomware-Resilient Self-Healing XML Documents [J].
Al-Dwairi, Mahmoud ;
Shatnawi, Ahmed S. ;
Al-Khaleel, Osama ;
Al-Duwairi, Basheer .
FUTURE INTERNET, 2022, 14 (04)
[5]   MALGRA: Machine Learning and N-Gram Malware Feature Extraction and Detection System [J].
Ali, Muhammad ;
Shiaeles, Stavros ;
Bendiab, Gueltoum ;
Ghita, Bogdan .
ELECTRONICS, 2020, 9 (11) :1-20
[6]   Detecting Ransomware Using Process Behavior Analysis [J].
Arabo, Abdullahi ;
Dijoux, Remi ;
Poulain, Timothee ;
Chevalier, Gregoire .
COMPLEX ADAPTIVE SYSTEMS, 2020, 168 :289-296
[7]   A multi-dimensional machine learning approach to predict advanced malware [J].
Bahtiyar, Serif ;
Yaman, Mehmet Baris ;
Altinigne, Can Yilmaz .
COMPUTER NETWORKS, 2019, 160 :118-129
[8]  
Banin S., 2016, P NISK 2016 C
[9]   Combined dynamic multi-feature and rule-based behavior for accurate malware detection [J].
Belaoued, Mohamed ;
Boukellal, Abdelaziz ;
Koalal, Mohamed Amir ;
Derhab, Abdelouahid ;
Mazouzi, Smaine ;
Khan, Farrukh Aslam .
INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2019, 15 (11)
[10]  
Braue D., 2022, Global ransomware damage costs predicted to exceed 65 billion by 2031