DIComP: Lightweight Data-Driven Inference of Binary Compiler Provenance with High Accuracy

被引:7
作者
Chen, Ligeng [1 ]
He, Zhongling [1 ]
Wu, Hao [1 ]
Xu, Fengyuan [1 ]
Qian, Yi [1 ]
Mao, Bing [1 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Jiangsu, Peoples R China
来源
2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2022) | 2022年
关键词
Binary Analysis; Compilation Options;
D O I
10.1109/SANER53432.2022.00025
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Binary analysis is pervasively utilized to assess software security and test vulnerabilities without accessing source codes. The analysis validity is heavily influenced by the inferring ability of information related to the code compilation. Among the compilation information, compiler type and optimization level, as the key factors determining how binaries look like, are still difficult to be inferred efficiently with existing tools. In this paper, we conduct a thorough empirical study on the binary's appearance under various compilation settings and propose a lightweight binary analysis tool based on the simplest machine learning method, called DIComP to infer the compiler and optimization level via most relevant features according to the observation. Our comprehensive evaluations demonstrate that DIComP can fully recognize the compiler provenance, and it is effective in inferring the optimization levels with up to 90% accuracy. Also, it is efficient to infer thousands of binaries at a millisecond level with our lightweight machine learning model (1MB).
引用
收藏
页码:112 / 122
页数:11
相关论文
共 27 条
[1]  
[Anonymous], CLANG
[2]  
[Anonymous], SCIKIT LEARN
[3]  
[Anonymous], Keras
[4]  
[Anonymous], 2016, BIND
[5]  
[Anonymous], Conan
[6]  
[Anonymous], CVE20189251
[7]  
Bao T, 2014, PROCEEDINGS OF THE 23RD USENIX SECURITY SYMPOSIUM, P845
[8]  
Chen LG, 2021, Arxiv, DOI arXiv:2110.12989
[9]   CATI: Context-Assisted Type Inference from Stripped Binaries [J].
Chen, Ligeng ;
He, Zhongling ;
Mao, Bing .
2020 50TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN 2020), 2020, :88-98
[10]   HIMALIA: Recovering Compiler Optimization Levels from Binaries by Deep Learning [J].
Chen, Yu ;
Shi, Zhiqiang ;
Li, Hong ;
Zhao, Weiwei ;
Liu, Yiliang ;
Qiao, Yuansong .
INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1, 2019, 868 :35-47