A Universal Malicious Documents Static Detection Framework Based on Feature Generalization

被引:9
作者
Lu, Xiaofeng [1 ]
Wang, Fei [1 ]
Jiang, Cheng [1 ]
Lio, Pietro [2 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Cyberspace Secur, Beijing 100876, Peoples R China
[2] Univ Cambridge, Comp Lab, Cambridge CB3 0FD, England
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 24期
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
malicious document detection; static detection; feature generalization; machine learning;
D O I
10.3390/app112412134
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In this study, Portable Document Format (PDF), Word, Excel, Rich Test format (RTF) and image documents are taken as the research objects to study a static and fast method by which to detect malicious documents. Malicious PDF and Word document features are abstracted and extended, which can be used to detect other types of documents. A universal static detection framework for malicious documents based on feature generalization is then proposed. The generalized features include specification check errors, the structure path, code keywords, and the number of objects. The proposed method is verified on two datasets, and is compared with Kaspersky, NOD32, and McAfee antivirus software. The experimental results demonstrate that the proposed method achieves good performance in terms of the detection accuracy, runtime, and scalability. The average F1-score of all types of documents is found to be 0.99, and the average detection time of a document is 0.5926 s, which is at the same level as the compared antivirus software.
引用
收藏
页数:23
相关论文
共 37 条
[1]  
Akritidis P, 2005, INT FED INFO PROC, V181, P375
[2]  
Chen S, 2016, RES IMPLEMENTATION A
[3]  
Chen YZ, 2020, PROCEEDINGS OF THE 29TH USENIX SECURITY SYMPOSIUM, P2343
[4]  
Collberg C., 1997, A taxonomy of obfuscating transformations
[5]  
[杜学绘 Du Xuehui], 2019, [通信学报, Journal on Communications], V40, P118
[6]  
FBI, BEC SCAMS ACC HALF C
[7]   A Systematic Method on PDF Privacy Leakage Issues [J].
Feng, Yun ;
Liu, Baoxu ;
Cui, Xiang ;
Liu, Chaoge ;
Kang, Xuebin ;
Su, Junwei .
2018 17TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (IEEE TRUSTCOM) / 12TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING (IEEE BIGDATASE), 2018, :1020-1029
[8]   Malware Detection on Byte Streams of PDF Files Using Convolutional Neural Networks [J].
Jeong, Young-Seob ;
Woo, Jiyoung ;
Kang, Ah Reum .
SECURITY AND COMMUNICATION NETWORKS, 2019, 2019
[9]   Obfuscated VBA Macro Detection Using Machine Learning [J].
Kim, Sangwoo ;
Hong, Seokmyung ;
Oh, Jaesang ;
Lee, Heejo .
2018 48TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN), 2018, :490-501
[10]   Identifying image spam based on header and file properties using C4.5 decision trees and support vector machine learning [J].
Krasser, Sven ;
Tang, Yuchun ;
Gould, Jeremy ;
Alperovitc, Dmitri ;
Judge, Paul .
2007 IEEE INFORMATION ASSURANCE WORKSHOP, 2007, :255-+