PEM: Representing Binary Program Semantics for Similarity Analysis via a Probabilistic Execution Model

被引:2
作者
Xu, Xiangzhe [1 ]
Xuan, Zhou [1 ]
Feng, Shiwei [1 ]
Cheng, Siyuan [1 ]
Ye, Yapeng [1 ]
Shi, Qingkai [1 ]
Tao, Guanhong [1 ]
Yu, Le [1 ]
Zhang, Zhuo [1 ]
Zhang, Xiangyu [1 ]
机构
[1] Purdue Univ, W Lafayette, IN 47907 USA
来源
PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023 | 2023年
关键词
Binary Similarity Analysis; Program Analysis;
D O I
10.1145/3611643.3616301
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Binary similarity analysis determines if two binary executables are from the same source program. Existing techniques leverage static and dynamic program features and may utilize advanced Deep Learning techniques. Although they have demonstrated great potential, the community believes that a more effective representation of program semantics can further improve similarity analysis. In this paper, we propose a new method to represent binary program semantics. It is based on a novel probabilistic execution engine that can effectively sample the input space and the program path space of subject binaries. More importantly, it ensures that the collected samples are comparable across binaries, addressing the substantial variations of input specifications. Our evaluation on 9 real-world projects with 35k functions, and comparison with 6 state-of-the-art techniques show that PEM can achieve a precision of 96% with common settings, outperforming the baselines by 10-20%.
引用
收藏
页码:401 / 412
页数:12
相关论文
共 55 条
[51]   Codee: A Tensor Embedding Scheme for Binary Code Search [J].
Yang, Jia ;
Fu, Cai ;
Liu, Xiao-Yang ;
Yin, Heng ;
Zhou, Pan .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2021, 48 (07) :2224-2244
[52]   PMP: Cost-effective Forced Execution with Probabilistic Memory Pre-planning [J].
You, Wei ;
Zhang, Zhuo ;
Kwon, Yonghwi ;
Aafer, Yousra ;
Peng, Fei ;
Shi, Yu ;
Harmon, Carson ;
Zhang, Xiangyu .
2020 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP 2020), 2020, :1121-1138
[53]  
Yu ZP, 2020, AAAI CONF ARTIF INTE, V34, P1145
[54]   OSPREY: Recovery of Variable and Data Structure via Probabilistic Analysis for Stripped Binary [J].
Zhang, Zhuo ;
Ye, Yapeng ;
You, Wei ;
Tao, Guanhong ;
Lee, Wen-chuan ;
Kwon, Yonghwi ;
Aafer, Yousra ;
Zhang, Xiangyu .
2021 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP, 2021, :813-832
[55]   BDA: Practical Dependence Analysis for Binary Executables by Unbiased Whole-Program Path Sampling and Per-Path Abstract Interpretation [J].
Zhang, Zhuo ;
You, Wei ;
Tao, Guanhong ;
Wei, Guannan ;
Kwon, Yonghwi ;
Zhang, Xiangyu .
PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2019, 3 (OOPSLA)