A Machine Learning Classification Approach to Detect TLS-based Malware using Entropy-based Flow Set Features

被引:3
作者
Keshkeh, Kinan [1 ]
Jantan, Aman [1 ]
Alieyan, Kamal [2 ]
机构
[1] Univ Sains Malaysia, Sch Comp Sci, Gelugor, Malaysia
[2] Amman Arab Univ, Fac Comp Sci & Informat, Amman, Jordan
来源
JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY-MALAYSIA | 2022年 / 21卷 / 03期
关键词
Malware detection; machine learning; TLS; entropy; flow features;
D O I
10.32890/jict2022.21.3.1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Transport Layer Security (TLS) based malware is one of the most hazardous malware types, as it relies on encryption to conceal connections. Due to the complexity of TLS traffic decryption, several anomaly-based detection studies have been conducted to detect TLS-based malware using different features and machine learning (ML) algorithms. However, most of these studies utilized flow features with no feature transformation or relied on inefficient flow feature transformations like frequency-based periodicity analysis and outlier percentage. This paper introduces TLSMalDetect, a TLS-based malware detection approach that integrates periodicity-independent entropy-based flow set (EFS) features generated by a flow feature transformation technique to solve flow feature utilization issues in related research. The effectiveness of EFS features was evaluated in two ways: (1) by comparing them to the corresponding outlier percentage and flow features using four feature importance methods, and (2) by analyzing classification performance with and without EFS features. Moreover, new Transmission Control Protocol features not explored in the literature were incorporated into TLSMalDetect, and their contribution was assessed. This study's results proved that EFS features of the number of packets sent and received were superior to the related outlier percentage and flow features and could remarkably increase the performance up to similar to 42 percent in the case of Support Vector Machine accuracy. Furthermore, using the basic features, TLSMalDetect achieved the highest accuracy of 93.69 percent by Naive Bayes (NB) among the ML algorithms applied. From a comparison view, TLSMalDetect's Random Forest precision of 98.99 percent and NB recall of 92.91 percent exceeded the best relevant findings of previous studies. These comparative results demonstrated TLSMalDetect's ability to detect more malware flows out of total malicious flows than existing works. It could also generate more actual alerts from overall alerts than earlier research.
引用
收藏
页码:279 / 313
页数:35
相关论文
共 27 条
[1]  
Albright D., 2021, BENCHMARKING AVERAGE
[2]  
Anderson B. H., 2019, U.S. Patent, Patent No. [10,805,341, 10805341]
[3]   Deciphering malware's use of TLS (without decryption) [J].
Anderson, Blake ;
Paul, Subharthi ;
McGrew, David .
JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHNIQUES, 2018, 14 (03) :195-211
[4]   Machine Learning for Encrypted Malware Traffic Classification: Accounting for Noisy Labels and Non-Stationarity [J].
Anderson, Blake ;
McGrew, David .
KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, :1723-1732
[5]   Identifying Encrypted Malware Traffic with Contextual Flow Data [J].
Anderson, Blake ;
McGrew, David .
AISEC'16: PROCEEDINGS OF THE 2016 ACM WORKSHOP ON ARTIFICIAL INTELLIGENCE AND SECURITY, 2016, :35-46
[6]  
[Anonymous], MCFP DATASET MALWARE
[7]  
[Anonymous], TRANSPORT LAYER SECU
[8]  
Bazuhair W, 2020, 2020 10TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), P200, DOI [10.1109/CCWC47524.2020.9031116, 10.1109/ccwc47524.2020.9031116]
[9]   Malware Detection based on HTTPS Characteristic via Machine Learning [J].
Calderon, Paul ;
Hasegawa, Hirokazu ;
Yamaguchi, Yukiko ;
Shimada, Hajime .
ICISSP: PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS SECURITY AND PRIVACY, 2018, :410-417
[10]  
Dai R., 2020, P 9 INT C COMM NETW, P40, DOI [DOI 10.1145/3371676.3371697, 10.1145/3371676]