API2Vec: Learning Representations of API Sequences for Malware Detection
被引:11
作者:
Cui, Lei
论文数: 0引用数: 0
h-index: 0
机构:
Zhongguancun Lab, Beijing, Peoples R ChinaZhongguancun Lab, Beijing, Peoples R China
Cui, Lei
[1
]
Cui, Jiancong
论文数: 0引用数: 0
h-index: 0
机构:
Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R ChinaZhongguancun Lab, Beijing, Peoples R China
Cui, Jiancong
[2
,3
]
Ji, Yuede
论文数: 0引用数: 0
h-index: 0
机构:
Univ North Texas, Dallas, TX USAZhongguancun Lab, Beijing, Peoples R China
Ji, Yuede
[4
]
Hao, Zhiyu
论文数: 0引用数: 0
h-index: 0
机构:
Zhongguancun Lab, Beijing, Peoples R ChinaZhongguancun Lab, Beijing, Peoples R China
Hao, Zhiyu
[1
]
Li, Lun
论文数: 0引用数: 0
h-index: 0
机构:
Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R ChinaZhongguancun Lab, Beijing, Peoples R China
Li, Lun
[3
]
Ding, Zhenquan
论文数: 0引用数: 0
h-index: 0
机构:
Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R ChinaZhongguancun Lab, Beijing, Peoples R China
Ding, Zhenquan
[3
]
机构:
[1] Zhongguancun Lab, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
[3] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[4] Univ North Texas, Dallas, TX USA
来源:
PROCEEDINGS OF THE 32ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2023
|
2023年
基金:
中国国家自然科学基金;
关键词:
Malware Detection;
Embedding;
Deep Learning;
Random Walk;
CLASSIFICATION;
D O I:
10.1145/3597926.3598054
中图分类号:
TP31 [计算机软件];
学科分类号:
081202 ;
0835 ;
摘要:
Analyzing malware based on API call sequence is an effective approach as the sequence reflects the dynamic execution behavior of malware. Recent advancements in deep learning have led to the application of these techniques for mining useful information from API call sequences. However, these methods mainly operate on raw sequences and may not effectively capture important information especially for multi-process malware, mainly due to the API call interleaving problem. Motivated by that, this paper presents API2Vec, a graph based API embedding method for malware detection. First, we build a graph model to represent the raw sequence. In particular, we design the temporal process graph (TPG) to model inter-process behavior and temporal API graph (TAG) to model intra-process behavior. With such graphs, we design a heuristic random walk algorithm to generate a number of paths that can capture the fine-grained malware behavior. By pre-training the paths using the Doc2Vec model, we are able to generate the embeddings of paths and APIs, which can further be used for malware detection. The experiments on a real malware dataset demonstrate that API2Vec outperforms the state-of-the-art embedding methods and detection methods for both accuracy and robustness, especially for multi-process malware.
引用
收藏
页码:261 / 273
页数:13
相关论文
共 74 条
[1]
Abbas M.F.B., 2019, INT C APPL TECHNIQUE, V7, P181
[2]
Ahmed F., 2009, P 2 ACM WORKSH SEC A, P55, DOI DOI 10.1145/1654988.1655003