Cross-Regional Malware Detection via Model Distilling and Federated Learning

被引:1
作者
Botacin, Marcus [1 ]
Gomes, Heitor [2 ]
机构
[1] Texas A&M Univ, College Stn, TX 77840 USA
[2] Victoria Univ Wellington, Wellington, New Zealand
来源
PROCEEDINGS OF 27TH INTERNATIONAL SYMPOSIUM ON RESEARCH IN ATTACKS, INTRUSIONS AND DEFENSES, RAID 2024 | 2024年
关键词
malware; federated learning; model distilling;
D O I
10.1145/3678890.3678893
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Machine Learning (ML) is a key part of modern malware detection pipelines, but its application is not straightforward. It involves multiple practical challenges that are frequently unaddressed by the literature works. A key challenge is the heterogeneity of scenarios. Antivirus (AV) companies for instance operate under different performance constraints in the backend and in the endpoint, and with a diversity of datasets according to the country they operate in. In this paper, we evaluate the impact of these heterogeneous aspects by developing a classification pipeline for 3 datasets of 10K malware samples each collected by an AV company in the USA, Brazil, and Japan in the same period. We characterize the different requirements for these datasets and we show that a different number of features is required to reach the optimal detection rate in each scenario. We show that a global model combining the three datasets increases the detection of the three individual datasets. We propose using Federated Learning (FL) to build the global model and a distilling process to generate the local versions. We order the samples temporally to show that although retraining on concept drift detection helps recover the detection rate, only a FL approach can increase the detection rate.
引用
收藏
页码:97 / 113
页数:17
相关论文
共 39 条
[1]  
Arp D, 2022, PROCEEDINGS OF THE 31ST USENIX SECURITY SYMPOSIUM, P3971
[2]  
Bader-El-Den M, 2014, I C COMP SYST APPLIC, P640, DOI 10.1109/AICCSA.2014.7073259
[3]  
Baena-Garcia M., 2006, 4 INT WORKSH KNOWL D, V6, P77
[4]  
Barbero F, 2022, P IEEE S SECUR PRIV, P805, DOI [10.1109/SP46214.2022.9833659, 10.1109/SP46214.2022.00068]
[5]  
Bifet A, 2010, J MACH LEARN RES, V11, P1601
[6]   AntiViruses under the microscope: A hands-on perspective [J].
Botacin, Marcus ;
Domingues, Felipe Duarte ;
Ceschin, Fabricio ;
Machnicki, Raphael ;
Zanata Alves, Marco Antonio ;
de Geus, Paulo Licio ;
Gregio, Andre .
COMPUTERS & SECURITY, 2022, 112
[7]   Challenges and pitfalls in malware research [J].
Botacin, Marcus ;
Ceschin, Fabricio ;
Sun, Ruimin ;
Oliveira, Daniela ;
Gregio, Andre .
COMPUTERS & SECURITY, 2021, 106
[8]   One Size Does Not Fit All: A Longitudinal Analysis of Brazilian Financial Malware [J].
Botacin, Marcus ;
Aghakhani, Hojjat ;
Ortolani, Stefano ;
Kruegel, Christopher ;
Vigna, Giovanni ;
Oliveira, Daniela ;
De Geus, Paulo Licio ;
Gregio, Andre .
ACM TRANSACTIONS ON PRIVACY AND SECURITY, 2021, 24 (02)
[9]   The Internet Banking [in]Security Spiral Past, Present, and Future of Online Banking Protection Mechanisms based on a Brazilian case study [J].
Botacin, Marcus ;
Kalysch, Anatoli ;
Gregiol, Andre .
14TH INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY AND SECURITY (ARES 2019), 2019,
[10]   We need to talk about antiviruses: challenges & pitfalls of AV evaluations [J].
Botacin, Marcus ;
Ceschin, Fabricio ;
de Geus, Paulo ;
Gregio, Andre .
COMPUTERS & SECURITY, 2020, 95