Malicious Log Detection Using Machine Learning to Maximize the Partial AUC

被引：1

作者：

Nishiyama, Taishi ^{[1
]}

Kumagai, Atsutoshi ^{[2
]}

Fujino, Akinori ^{[2
]}

Kamiya, Kazunori ^{[1
]}

机构：

[1] NTT Secur Japan, Tokyo, Japan

[2] NTT Labs, Tokyo, Japan

来源：

2024 IEEE 21ST CONSUMER COMMUNICATIONS & NETWORKING CONFERENCE, CCNC | 2024年

关键词：

Partial AUC; Malware Detection; Log Analysis;

D O I：

10.1109/CCNC51664.2024.10454779

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A recent trend in security log analysis is to utilize machine learning methods to detect malware. By using machine learning, we can save on labor and achieve an advanced countermeasure against constantly evolving malware. When evaluating the classification performance of malicious log detection, the true positive rate (TPR) in a low false positive rate (FPR) interval is widely recognized as important since network operators want to detect as much malware as possible while reducing the false positives of benign logs. However, the conventional supervised learning methods cannot directly maximize the TPR in a low FPR interval since they are trained to maximize accuracy. Therefore, this paper proposes a method to maximize the partial area under the receiver operating characteristic curve (pAUC), which is the mean TPR with a specific interval of the FPR. The proposed method uses the conventional supervised method as a baseline, changes the objective function of the baseline supervised learning method to maximize the pAUC, and learns on the basis of the proposed algorithm. The advantage of the proposed method is its high applicability since it can be implemented by using any conventional supervised learning method for binary classification as a baseline and modifying its objective function. We compared the proposed methods, i.e., the pAUC maximization methods on various supervised learning models, with baseline supervised learning methods by using a public dataset (NSL-KDD) and a dataset consisting of proxy logs from a real-world large enterprise network. From the results, the proposed method outperforms the baseline supervised learning method in terms of several performance measures such as the pAUC, AUC, and TPR at a low FPR. The results suggest that the proposed method is beneficial in actual operation since it can detect more malware when operating with the same FPR compared to the conventional supervised learning methods.

引用

页码：339 / 344

页数：6

共 25 条

[1]

Aoki K., 2011, P 3 INT WORKSHOP CYB, P1

[2]

Avtamata, about us

[3]

AWS, About us

[4]

Bartos K, 2016, PROCEEDINGS OF THE 25TH USENIX SECURITY SYMPOSIUM, P807

[5] ILAB: An Interactive Labelling Strategy for Intrusion Detection [J].

Beaugnon, Anael ;

Chifflier, Pierre ;

Bach, Francis .

RESEARCH IN ATTACKS, INTRUSIONS, AND DEFENSES (RAID 2017), 2017, 10453 :120-140

[6]

Bishop CM., 2006, Pattern Recognition and Machine Learning

[7]

Calders T, 2007, LECT NOTES ARTIF INT, V4702, P42

[8] Accurate Malware Detection by Extreme Abstraction [J].

Copty, Fady ;

Danos, Matan ;

Edelstein, Orit ;

Eisner, Cindy ;

Murik, Dov ;

Zeltser, Benjamin .

34TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE (ACSAC 2018), 2018, :101-111

[9]

Dhanabal L., 2015, International Journal of Advanced Research in Computer and Communication Engineering, V4, P446

[10] Partial AUC estimation and regression [J].

Dodd, LE ;

Pepe, MS .

BIOMETRICS, 2003, 59 (03) :614-623

← 1 2 3 →