CodeExtract: Enhancing Binary Code Similarity Detection with Code Extraction Techniques

被引:0
|
作者
Jia, Lichen [1 ,2 ]
Wu, Chenggang [1 ,2 ]
Zhang, Peihua [1 ,2 ]
Wang, Zhe [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, SKLP, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Learning-based Binary Similarity Analysis; Function Inline; Program Analysis; ALGORITHM;
D O I
10.1145/3652032.3657572
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In the field of binary code similarity detection (BCSD), when dealing with functions in binary form, the conventional approach is to identify a set of functions that are most similar to the target function. These similar functions often originate from the same source code but may differ due to variations in compilation settings. Such analysis is crucial for applications in the security domain, including vulnerability discovery, malware detection, software plagiarism detection, and patch analysis. Function inlining, an optimization technique employed by compilers, embeds the code of callee functions directly into the caller function. Due to different compilation options (such as O1 and O3) leading to varying levels of function inlining, this results in significant discrepancies between binary functions derived from the same source code under different compilation settings, posing challenges to the accuracy of state-of-the-art (SOTA) learning-based binary code similarity detection (LB-BCSD) methods. In contrast to function inlining, code extraction technology can identify and separate duplicate code within a program, replacing it with corresponding function calls. To overcome the impact of function inlining, this paper introduces a novel approach, CodeExtract. This method initially utilizes code extraction techniques to transform code introduced by function inlining back into function calls. Subsequently, it actively inlines functions that cannot undergo code extraction, effectively eliminating the differences introduced by function inlining. Experimental validation shows that CodeExtract enhances the accuracy of LB-BCSD models by 20% in addressing the challenges posed by function inlining.
引用
收藏
页码:143 / 154
页数:12
相关论文
共 50 条
  • [1] A Survey of Binary Code Similarity Detection Techniques
    Ruan, Liting
    Xu, Qizhen
    Zhu, Shunzhi
    Huang, Xujing
    Lin, Xinyang
    ELECTRONICS, 2024, 13 (09)
  • [2] Binary Code Similarity Detection
    Liu, Zian
    2021 36TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING ASE 2021, 2021, : 1056 - 1060
  • [3] Binary Code Similarity Detection: State and Future
    Li, Zhenshan
    Liu, Hao
    Shan, Ruijie
    Sun, Yanbin
    Jiang, Yu
    Hu, Ning
    2023 IEEE 12TH INTERNATIONAL CONFERENCE ON CLOUD NETWORKING, CLOUDNET, 2023, : 408 - 412
  • [4] Unsupervised Binary Code Translation with Application to Code Similarity Detection and Vulnerability Discovery
    Ahmad, Iftakhar
    Luo, Lannan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 14581 - 14592
  • [5] Enhancing Code Similarity Analysis for Effective Vulnerability Detection
    Zhu, Chunlei
    Tang, Yunshan
    Wang, Qiang
    Li, Mei
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (CSSE 2019), 2019,
  • [6] A Survey of Binary Code Similarity
    Ul Haq, Irfan
    Caballero, Juan
    ACM COMPUTING SURVEYS, 2022, 54 (03)
  • [7] Feature Extraction Methods for Binary Code Similarity Detection Using Neural Machine Translation Models
    Ito, Norimitsu
    Hashimoto, Masaki
    Otsuka, Akira
    IEEE ACCESS, 2023, 11 : 102796 - 102805
  • [8] BINCODEX: A comprehensive and multi-level dataset for evaluating binary code similarity detection techniques
    Zhang P.
    Wu C.
    Wang Z.
    BenchCouncil Transactions on Benchmarks, Standards and Evaluations, 2024, 4 (02):
  • [9] BinDeep: A deep learning approach to binary code similarity detection
    Tian, Donghai
    Jia, Xiaoqi
    Ma, Rui
    Liu, Shuke
    Liu, Wenjing
    Hu, Changzhen
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 168
  • [10] <monospace>OpTrans</monospace>: enhancing binary code similarity detection with function inlining re-optimization
    Sha, Zihan
    Lan, Yang
    Zhang, Chao
    Wang, Hao
    Gao, Zeyu
    Zhang, Bolun
    Shu, Hui
    EMPIRICAL SOFTWARE ENGINEERING, 2025, 30 (02)