PEM: Representing Binary Program Semantics for Similarity Analysis via a Probabilistic Execution Model

被引:2
作者
Xu, Xiangzhe [1 ]
Xuan, Zhou [1 ]
Feng, Shiwei [1 ]
Cheng, Siyuan [1 ]
Ye, Yapeng [1 ]
Shi, Qingkai [1 ]
Tao, Guanhong [1 ]
Yu, Le [1 ]
Zhang, Zhuo [1 ]
Zhang, Xiangyu [1 ]
机构
[1] Purdue Univ, W Lafayette, IN 47907 USA
来源
PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023 | 2023年
关键词
Binary Similarity Analysis; Program Analysis;
D O I
10.1145/3611643.3616301
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Binary similarity analysis determines if two binary executables are from the same source program. Existing techniques leverage static and dynamic program features and may utilize advanced Deep Learning techniques. Although they have demonstrated great potential, the community believes that a more effective representation of program semantics can further improve similarity analysis. In this paper, we propose a new method to represent binary program semantics. It is based on a novel probabilistic execution engine that can effectively sample the input space and the program path space of subject binaries. More importantly, it ensures that the collected samples are comparable across binaries, addressing the substantial variations of input specifications. Our evaluation on 9 real-world projects with 35k functions, and comparison with 6 state-of-the-art techniques show that PEM can achieve a precision of 96% with common settings, outperforming the baselines by 10-20%.
引用
收藏
页码:401 / 412
页数:12
相关论文
共 55 条
[1]  
[Anonymous], DISTRIBUTIONS STAT C
[2]  
ARM64, 2022, Learn the architecture-AArch64 Instruction Set Architecture
[3]  
Arp D, 2021, Arxiv, DOI arXiv:2010.09470
[4]  
Bao T, 2014, PROCEEDINGS OF THE 23RD USENIX SECURITY SYMPOSIUM, P845
[5]  
Bilge L, 2012, 28TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE (ACSAC 2012), P129
[6]  
BinDiff, 2022, zynamics BinDiff
[7]   Software Plagiarism Detection: A Graph-based Approach [J].
Chae, Dong-Kyu ;
Ha, Jiwoon ;
Kim, Sang-Wook ;
Kang, BooJoong ;
Im, Eul Gyu .
PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, :1577-1580
[8]   BinGo: Cross-Architecture Cross-OS Binary Search [J].
Chandramohan, Mahinthan ;
Xue, Yinxing ;
Xu, Zhengzi ;
Liu, Yang ;
Cho, Chia Yuan ;
Kuan, Tan Hee Beng .
FSE'16: PROCEEDINGS OF THE 2016 24TH ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON FOUNDATIONS OF SOFTWARE ENGINEERING, 2016, :678-689
[9]  
Coreutils, 2022, Coreutils-GNU core utilities
[10]  
David Y, 2016, ACM SIGPLAN NOTICES, V51, P266, DOI [10.1145/2980983.2908126, 10.1145/2908080.2908126]