A Hypercuboid-Based Machine Learning Algorithm for Malware Classification

被引:0
作者
Thi Thu Trang Nguyen [1 ]
Dai Tho Nguyen [1 ]
Duy Loi Vu [1 ]
机构
[1] VNU Univ Engn & Technol, Hanoi, Vietnam
来源
2021 RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF 2021) | 2021年
关键词
Malware classification; machine learning; k-nearest neighbors algorithms; prototype-based learning; hypercuboids;
D O I
10.1109/RIVF51545.2021.9642093
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Malware attacks have been among the most serious threats to cyber security in the last decade. Anti-malware software can help safeguard information systems and minimize their exposure to the malware. Most of anti-malware programs detect malware instances based on signature or pattern matching. Data mining and machine learning techniques can be used to automatically detect models and patterns behind different types of malware variants. However, traditional machine-based learning techniques such as SVM, decision trees and naive Bayes seem to be only suitable for detecting malicious code, not effective enough for complex problems such as classification. In this article, we propose a new prototype extraction method for non-traditional prototype-based machine learning classification. The prototypes are extracted using hypercuboids. Each hypercuboid covers all training data points of a malware family. Then we choose the data points nearest to the hyperplanes as the prototypes. Malware samples will be classified based on the distances to the prototypes. Experiments results show that our proposition leads to F1 score of 96.5% for classification of known malware and 97.7% for classification of unknown malware, both better than the original prototype-based classification method.
引用
收藏
页码:301 / 306
页数:6
相关论文
共 16 条
[1]  
Ahmed F, 2009, P 2 ACM WORKSH SEC A, P55, DOI [DOI 10.1145/1654988.1655003, 10.1145/1654988.1655003]
[2]   Profile hidden Markov models and metamorphic virus detection [J].
Attaluri, Srilatha ;
McGhee, Scott ;
Stamp, Mark .
JOURNAL IN COMPUTER VIROLOGY AND HACKING TECHNIQUES, 2009, 5 (02) :151-169
[3]   Malware characteristics and threats on the internet ecosystem [J].
Chen, Zhongqiang ;
Roussopoulos, Mema ;
Liang, Zhanyan ;
Zhang, Yuan ;
Chen, Zhongrong ;
Delis, Alex .
JOURNAL OF SYSTEMS AND SOFTWARE, 2012, 85 (07) :1650-1672
[4]  
Comar PM, 2013, IEEE INFOCOM SER, P2022
[5]   CLUSTERING TO MINIMIZE THE MAXIMUM INTERCLUSTER DISTANCE [J].
GONZALEZ, TF .
THEORETICAL COMPUTER SCIENCE, 1985, 38 (2-3) :293-306
[6]  
Guanghui Liang, 2016, International Journal of Information and Education Technology, V6, P291, DOI 10.7763/IJIET.2016.V6.702
[7]  
Khodamoradi Peyman., 2015, 2015 18th CSI International Symposium on Computer Architecture and Digital Systems (CADS), P1, DOI DOI 10.1109/CADS.2015.7377792
[8]   Support Vector Machine for malware analysis and classification [J].
Kruczkowski, Michal ;
Niewiadomska-Szynkiewicz, Ewa .
2014 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 2, 2014, :415-420
[9]  
Quinlan J.R., 1993, 10 INT C MACH LEARN
[10]  
Rieck K, 2008, J MACH LEARN RES, V9, P23