Neutron: an attention-based neural decompiler

被引:7
作者
Liang, Ruigang [1 ,2 ]
Cao, Ying [1 ,2 ]
Hu, Peiwei [1 ,2 ]
Chen, Kai [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, SKLOIS, Beijing 100093, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing 100049, Peoples R China
关键词
Decompilation; LSTM; Attention; Translation;
D O I
10.1186/s42400-021-00070-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Decompilation aims to analyze and transform low-level program language (PL) codes such as binary code or assembly code to obtain an equivalent high-level PL. Decompilation plays a vital role in the cyberspace security fields such as software vulnerability discovery and analysis, malicious code detection and analysis, and software engineering fields such as source code analysis, optimization, and cross-language cross-operating system migration. Unfortunately, the existing decompilers mainly rely on experts to write rules, which leads to bottlenecks such as low scalability, development difficulties, and long cycles. The generated high-level PL codes often violate the code writing specifications. Further, their readability is still relatively low. The problems mentioned above hinder the efficiency of advanced applications (e.g., vulnerability discovery) based on decompiled high-level PL codes.In this paper, we propose a decompilation approach based on the attention-based neural machine translation (NMT) mechanism, which converts low-level PL into high-level PL while acquiring legibility and keeping functionally similar. To compensate for the information asymmetry between the low-level and high-level PL, a translation method based on basic operations of low-level PL is designed. This method improves the generalization of the NMT model and captures the translation rules between PLs more accurately and efficiently. Besides, we implement a neural decompilation framework called Neutron. The evaluation of two practical applications shows that Neutron's average program accuracy is 96.96%, which is better than the traditional NMT model.
引用
收藏
页数:13
相关论文
共 34 条
[1]  
Allamanis M, 2015, PR MACH LEARN RES, V37, P2123
[2]  
[Anonymous], 2020, MATH C LIB
[3]  
Durfina L, 2013, WORK CONF REVERSE EN, P449, DOI 10.1109/WCRE.2013.6671321
[4]  
Fu C, 2019, ADV NEUR IN, V32
[5]   Machine-Learning-Guided Selectively Unsound Static Analysis [J].
Heo, Kihong ;
Oh, Hakjoo ;
Yi, Kwangkeun .
2017 IEEE/ACM 39TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2017, :519-529
[6]  
Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.8.1735, 10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
[7]  
Katz DS, 2018, 2018 25TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2018), P346, DOI 10.1109/SANER.2018.8330222
[8]  
Katz Omer, 2019, CoRR
[9]  
Koustek J., 2017, Retdec: An open-source machine-code decompiler
[10]  
Levy D, 2017, PR MACH LEARN RES, V70