API2Vec: Learning Representations of API Sequences for Malware Detection

被引:11
作者
Cui, Lei [1 ]
Cui, Jiancong [2 ,3 ]
Ji, Yuede [4 ]
Hao, Zhiyu [1 ]
Li, Lun [3 ]
Ding, Zhenquan [3 ]
机构
[1] Zhongguancun Lab, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
[3] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[4] Univ North Texas, Dallas, TX USA
来源
PROCEEDINGS OF THE 32ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2023 | 2023年
基金
中国国家自然科学基金;
关键词
Malware Detection; Embedding; Deep Learning; Random Walk; CLASSIFICATION;
D O I
10.1145/3597926.3598054
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Analyzing malware based on API call sequence is an effective approach as the sequence reflects the dynamic execution behavior of malware. Recent advancements in deep learning have led to the application of these techniques for mining useful information from API call sequences. However, these methods mainly operate on raw sequences and may not effectively capture important information especially for multi-process malware, mainly due to the API call interleaving problem. Motivated by that, this paper presents API2Vec, a graph based API embedding method for malware detection. First, we build a graph model to represent the raw sequence. In particular, we design the temporal process graph (TPG) to model inter-process behavior and temporal API graph (TAG) to model intra-process behavior. With such graphs, we design a heuristic random walk algorithm to generate a number of paths that can capture the fine-grained malware behavior. By pre-training the paths using the Doc2Vec model, we are able to generate the embeddings of paths and APIs, which can further be used for malware detection. The experiments on a real malware dataset demonstrate that API2Vec outperforms the state-of-the-art embedding methods and detection methods for both accuracy and robustness, especially for multi-process malware.
引用
收藏
页码:261 / 273
页数:13
相关论文
共 74 条
  • [1] Abbas M.F.B., 2019, INT C APPL TECHNIQUE, V7, P181
  • [2] Ahmed F., 2009, P 2 ACM WORKSH SEC A, P55, DOI DOI 10.1145/1654988.1655003
  • [3] Alon U., 2018, ARXIV
  • [4] A Multi-Perspective malware detection approach through behavioral fusion of API call sequence
    Amer, Eslam
    Zelinka, Ivan
    El-Sappagh, Shaker
    [J]. COMPUTERS & SECURITY, 2021, 110
  • [5] A dynamic Windows malware detection and prediction method based on contextual understanding of API call sequence
    Amer, Eslam
    Zelinka, Ivan
    [J]. COMPUTERS & SECURITY, 2020, 92
  • [6] [Anonymous], 2023, VirusTotal reports
  • [7] [Anonymous], 2022, About Us
  • [8] A Comprehensive Review on Malware Detection Approaches
    Aslan, Omer
    Samet, Refik
    [J]. IEEE ACCESS, 2020, 8 : 6249 - 6271
  • [9] PbMMD: A novel policy based multi-process malware detection
    Bidoki, Seyyed Mojtaba
    Jalili, Saeed
    Tajoddin, Asghar
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2017, 60 : 57 - 70
  • [10] Ransomware attacks: detection, prevention and cure
    Brewer R.
    [J]. 1600, Elsevier Ltd (2016): : 5 - 9