API2Vec: Learning Representations of API Sequences for Malware Detection

被引：11

作者：

Cui, Lei ^{[1
]}

Cui, Jiancong ^{[2
,3
]}

Ji, Yuede ^{[4
]}

Hao, Zhiyu ^{[1
]}

Li, Lun ^{[3
]}

Ding, Zhenquan ^{[3
]}

机构：

[1] Zhongguancun Lab, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China

[3] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China

[4] Univ North Texas, Dallas, TX USA

来源：

PROCEEDINGS OF THE 32ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

Malware Detection; Embedding; Deep Learning; Random Walk; CLASSIFICATION;

D O I：

10.1145/3597926.3598054

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Analyzing malware based on API call sequence is an effective approach as the sequence reflects the dynamic execution behavior of malware. Recent advancements in deep learning have led to the application of these techniques for mining useful information from API call sequences. However, these methods mainly operate on raw sequences and may not effectively capture important information especially for multi-process malware, mainly due to the API call interleaving problem. Motivated by that, this paper presents API2Vec, a graph based API embedding method for malware detection. First, we build a graph model to represent the raw sequence. In particular, we design the temporal process graph (TPG) to model inter-process behavior and temporal API graph (TAG) to model intra-process behavior. With such graphs, we design a heuristic random walk algorithm to generate a number of paths that can capture the fine-grained malware behavior. By pre-training the paths using the Doc2Vec model, we are able to generate the embeddings of paths and APIs, which can further be used for malware detection. The experiments on a real malware dataset demonstrate that API2Vec outperforms the state-of-the-art embedding methods and detection methods for both accuracy and robustness, especially for multi-process malware.

引用

页码：261 / 273

页数：13

共 74 条

[1] Abbas M.F.B., 2019, INT C APPL TECHNIQUE, V7, P181
[2] Ahmed F., 2009, P 2 ACM WORKSH SEC A, P55, DOI DOI 10.1145/1654988.1655003
[3] Alon U., 2018, ARXIV
[4] A Multi-Perspective malware detection approach through behavioral fusion of API call sequence
Amer, Eslam
Zelinka, Ivan
El-Sappagh, Shaker
[J]. COMPUTERS & SECURITY, 2021, 110
[5] A dynamic Windows malware detection and prediction method based on contextual understanding of API call sequence
Amer, Eslam
Zelinka, Ivan
[J]. COMPUTERS & SECURITY, 2020, 92
[6] [Anonymous], 2023, VirusTotal reports
[7] [Anonymous], 2022, About Us
[8] A Comprehensive Review on Malware Detection Approaches
Aslan, Omer
Samet, Refik
[J]. IEEE ACCESS, 2020, 8 : 6249 - 6271
[9] PbMMD: A novel policy based multi-process malware detection
Bidoki, Seyyed Mojtaba
Jalili, Saeed
Tajoddin, Asghar
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2017, 60 : 57 - 70
[10] Ransomware attacks: detection, prevention and cure
Brewer R.
[J]. 1600, Elsevier Ltd (2016): : 5 - 9

← 1 2 3 4 5 6 7 8 →