MOJI: Character-level convolutional neural networks for Malicious Obfuscated Java']JavaScript Inspection

被引:0
作者
Ishida, Minato [1 ]
Kaneko, Naoshi [1 ]
Sumi, Kazuhiko [1 ]
机构
[1] Aoyama Gakuin Univ, 5-10-1 Fuchinobe,Chuo Ku, Sagamihara, Kanagawa 2525258, Japan
关键词
Malware detection; !text type='Java']Java[!/text]Script; Convolutional neural networks; Deep learning;
D O I
10.1016/j.asoc.2023.110138
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
JavaScript malware is one of the major threats to web security. A big challenge in detecting such malicious JavaScript is obfuscation, which transforms a program code into a harder-to-understand representation while preserving its original functionality. Many malicious JavaScript detection meth-ods perform code abstraction and prior feature extraction to uncover the functionality hidden by obfuscation. However, such preprocessing steps significantly limit the detectors' efficiency in practical situations. This paper presents Malicious Obfuscated JavaScript Inspector (MOJI), a novel method for malicious JavaScript detection, which requires no code abstraction or prior feature extraction. Instead, our detector directly accepts a sequence of characters in JavaScript code as an input and outputs its maliciousness score. Specifically, we design a character-level convolutional neural network consisting mainly of several 1D convolutional layers and fully connected layers. We evaluate the proposed method on a dataset composed of 24,000 JavaScript codes and show that our method outperforms existing malicious JavaScript detectors in terms of both detection performance and running time. We also provide an analysis of the effect of additional obfuscation on the same dataset. Our results indicate that MOJI is far more robust to obfuscation than the existing methods and commercial antivirus software. (c) 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页数:12
相关论文
共 48 条
[1]  
Ba JL, 2016, arXiv
[2]  
Bojanowski P., 2017, Trans. ACL, V5, P135, DOI [DOI 10.1162/TACLA00051, 10.1162/tacla00051, 10.1162/tacl_a_00051, DOI 10.1162/TACL_A_00051]
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]  
Cho K., 2014, P EMNLP ASS COMP LIN
[5]  
Choi Y, 2009, LECT NOTES COMPUT SC, V5899, P160, DOI 10.1007/978-3-642-10509-8_19
[6]   SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297
[7]  
Curtsinger C., 2011, USENIX SEC S
[8]  
ESET, 2021, T32021 ESET
[9]   JS']JStrong: Malicious Java']JavaScript detection based on code semantic representation and graph neural network [J].
Fang, Yong ;
Huang, Chaoyi ;
Zeng, Minchuan ;
Zhao, Zhiying ;
Huang, Cheng .
COMPUTERS & SECURITY, 2022, 118
[10]   Detecting malicious Java']JavaScript code based on semantic analysis [J].
Fang, Yong ;
Huang, Cheng ;
Su, Yu ;
Qiu, Yaoyao .
COMPUTERS & SECURITY, 2020, 93