Boosting Neural Networks to Decompile Optimized Binaries

被引:5
作者
Cao, Ying [1 ,2 ]
Liang, Ruigang [1 ,2 ]
Chen, Kai [1 ,2 ,3 ]
Hu, Peiwei [1 ,2 ]
机构
[1] Chinese Acad Sci, IIE, SKLOIS, Beijing, Peoples R China
[2] UCAS, Sch CyberSecur, Beijing, Peoples R China
[3] BAAI, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 38TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE, ACSAC 2022 | 2022年
基金
北京市自然科学基金;
关键词
D O I
10.1145/3564625.3567998
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Decompilation aims to transform a low-level program language (LPL) (eg., binary file) into its functionally-equivalent high-level program language (HPL) (e.g., C/C++). It is a core technology in software security, especially in vulnerability discovery and malware analysis. In recent years, with the successful application of neural machine translation (NMT) models in natural language processing (NLP), researchers have tried to build neural decompilers by borrowing the idea of NMT. They formulate the decompilation process as a translation problem between LPL and HPL, aiming to reduce the human cost required to develop decompilation tools and improve their generalizability. However, state-of-the-art learning-based decompilers do not cope well with compiler-optimized binaries. Since real-world binaries are mostly compiler-optimized, decompilers that do not consider optimized binaries have limited practical significance. In this paper, we propose a novel learning-based approach named NeurDP, that targets compiler-optimized binaries. NeurDP uses a graph neural network (GNN) model to convert LPL to an intermediate representation (IR), which bridges the gap between source code and optimized binary. We also design an Optimized Translation Unit (OTU) to split functions into smaller code fragments for better translation performance. Evaluation results on datasets containing various types of statements show that NeurDP can decompile optimized binaries with 45.21% higher accuracy than state-of-the-art neural decompilation frameworks.
引用
收藏
页码:508 / 518
页数:11
相关论文
共 35 条
  • [1] [Anonymous], 2009, strip
  • [2] [Anonymous], 2021, cfile
  • [3] [Anonymous], 2022, Decompiler and Beyond
  • [4] Cho KYHY, 2014, Arxiv, DOI arXiv:1406.1078
  • [5] Chua ZL, 2017, PROCEEDINGS OF THE 26TH USENIX SECURITY SYMPOSIUM (USENIX SECURITY '17), P99
  • [6] Darki Ahmad, 2021, DisCo: Combining Disassemblers for Improved Performance
  • [7] Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization
    Ding, Steven H. H.
    Fung, Benjamin C. M.
    Charland, Philippe
    [J]. 2019 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP 2019), 2019, : 472 - 489
  • [8] Downing E, 2021, PROCEEDINGS OF THE 30TH USENIX SECURITY SYMPOSIUM, P3469
  • [9] Elsabagh M, 2020, PROCEEDINGS OF THE 29TH USENIX SECURITY SYMPOSIUM, P2379
  • [10] Feng ZY, 2020, Arxiv, DOI [arXiv:2002.08155, 10.48550/arXiv.2002.08155]