DIRE: A Neural Approach to Decompiled Identifier Naming

被引:48
作者
Lacomis, Jeremy [1 ]
Yin, Pengcheng [1 ]
Schwartz, Edward J. [2 ]
Allamanis, Miltiadis [3 ]
Le Goues, Claire [1 ]
Neubig, Graham [1 ]
Vasilescu, Bogdan [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Carnegie Mellon Univ, Software Engn Inst, Pittsburgh, PA 15213 USA
[3] Microsoft Res, Redmond, WA USA
来源
34TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE 2019) | 2019年
基金
美国国家科学基金会;
关键词
D O I
10.1109/ASE.2019.00064
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The decompiler is one of the most common tools for examining binaries without corresponding source code. It transforms binaries into high-level code, reversing the compilation process. Decompilers can reconstruct much of the information that is lost during the compilation process (e.g., structure and type information). Unfortunately, they do not reconstruct semantically meaningful variable names, which are known to increase code understandability. We propose the Decompiled Identifier Renaming Engine (DIRE), a novel probabilistic technique for variable name recovery that uses both lexical and structural information recovered by the decompiler. We also present a technique for generating corpora suitable for training and evaluating models of decompiled code renaming, which we use to create a corpus of 164,632 unique x86-64 binaries generated from C projects mined from GITHUB.(1) Our results show that on this corpus DIRE can predict variable names identical to the names in the original source code up to 74.3% of the time.
引用
收藏
页码:640 / 651
页数:12
相关论文
共 40 条
  • [1] Allamanis M., 2018, ICLR 18
  • [2] Allamanis M., 2019, P 2018 ACM SIGPLAN I
  • [3] A Survey of Machine Learning for Big Code and Naturalness
    Allamanis, Miltiadis
    Barr, Earl T.
    Devanbu, Premkumar
    Sutton, Charles
    [J]. ACM COMPUTING SURVEYS, 2018, 51 (04)
  • [4] Suggesting Accurate Method and Class Names
    Allamanis, Miltiadis
    Barr, Earl T.
    Bird, Christian
    Sutton, Charles
    [J]. 2015 10TH JOINT MEETING OF THE EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND THE ACM SIGSOFT SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE 2015) PROCEEDINGS, 2015, : 38 - 49
  • [5] Learning Natural Coding Conventions
    Allamanis, Miltiadis
    Barr, Earl T.
    Bird, Christian
    Sutton, Charles
    [J]. 22ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (FSE 2014), 2014, : 281 - 293
  • [6] ALON U, 2018, PROGRAMMING LANGUAGE, V53, P404, DOI DOI 10.1145/3192366.3192412
  • [7] Alon U, 2019, ICLR
  • [8] [Anonymous], 2016, P 1 C MACH TRANSL
  • [9] [Anonymous], 2014, TECH REP
  • [10] Bavishi R., 2017, TECH REP