A Practical Black-Box Attack on Source Code Authorship Identification Classifiers

被引:11
|
作者
Liu, Qianjun [1 ]
Ji, Shouling [1 ]
Liu, Changchang [2 ]
Wu, Chunming [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Peoples R China
[2] IBM Thomas J Watson Res Ctr, Dept Distributed AI, Yorktown Hts, NY 10598 USA
基金
中国国家自然科学基金;
关键词
Feature extraction; Tools; Training; Syntactics; Predictive models; Perturbation methods; Transforms; Source code; authorship identification; adversarial stylometry; ROBUSTNESS;
D O I
10.1109/TIFS.2021.3080507
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Existing researches have recently shown that adversarial stylometry of source code can confuse source code authorship identification (SCAI) models, which may threaten the security of related applications such as programmer attribution, software forensics, etc. In this work, we propose source code authorship disguise (SCAD) to automatically hide programmers' identities from authorship identification, which is more practical than the previous work that requires to known the output probabilities or internal details of the target SCAI model. Specifically, SCAD trains a substitute model and develops a set of semantically equivalent transformations, based on which the original code is modified towards a disguised style with small manipulations in lexical features and syntactic features. When evaluated under totally black-box settings, on a real-world dataset consisting of 1,600 programmers, SCAD induces state-of-the-art SCAI models to cause above 30% misclassification rates. The efficiency and utility-preserving properties of SCAD are also demonstrated with multiple metrics. Furthermore, our work can serve as a guideline for developing more robust identification methods in the future.
引用
收藏
页码:3620 / 3633
页数:14
相关论文
共 34 条
  • [11] Stealthy Black-Box Attack With Dynamic Threshold Against MARL-Based Traffic Signal Control System
    Ren, Yan
    Zhang, Heng
    Du, Linkang
    Zhang, Zhikun
    Zhang, Jian
    Li, Hongran
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (10) : 12021 - 12031
  • [12] Black-Box Audio Adversarial Attack Using Particle Swarm Optimization
    Mun, Hyunjun
    Seo, Sunggwan
    Son, Baehoon
    Yun, Joobeom
    IEEE ACCESS, 2022, 10 : 23532 - 23544
  • [13] Advancing Few-Shot Black-Box Attack With Alternating Training
    Meng, Lingzhuang
    Shao, Mingwen
    Wang, Fan
    Qiao, Yuanjian
    Xu, Zhaofei
    IEEE TRANSACTIONS ON RELIABILITY, 2024, 73 (03) : 1544 - 1558
  • [14] Effectively Improving Data Diversity of Substitute Training for Data-Free Black-Box Attack
    Wei, Yang
    Ma, Zhuo
    Ma, Zhuoran
    Qin, Zhan
    Liu, Yang
    Xiao, Bin
    Bi, Xiuli
    Ma, Jianfeng
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2024, 21 (04) : 4206 - 4219
  • [15] Dynamic Routing and Knowledge Re-Learning for Data-Free Black-Box Attack
    Qian, Xuelin
    Wang, Wenxuan
    Jiang, Yu-Gang
    Xue, Xiangyang
    Fu, Yanwei
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (01) : 486 - 501
  • [16] Detection Tolerant Black-Box Adversarial Attack Against Automatic Modulation Classification With Deep Learning
    Qi, Peihan
    Jiang, Tao
    Wang, Lizhan
    Yuan, Xu
    Li, Zan
    IEEE TRANSACTIONS ON RELIABILITY, 2022, 71 (02) : 674 - 686
  • [17] Overview of the PAN@FIRE 2020 Task on the Authorship Identification of SOurce COde
    Fadel, Ali
    Musleh, Husam
    Tuffaha, Ibraheem
    Al-Ayyoub, Mahmoud
    Jararweh, Yaser
    Benkhelifa, Elhadj
    PROCEEDINGS OF THE 12TH ANNUAL MEETING OF THE FORUM FOR INFORMATION RETRIEVAL EVALUATION (FIRE 2020), 2020, : 4 - 8
  • [18] Black-Box Adversarial Attack on Graph Neural Networks With Node Voting Mechanism
    Wen, Liangliang
    Liang, Jiye
    Yao, Kaixuan
    Wang, Zhiqiang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (10) : 5025 - 5038
  • [19] Query-Efficient Black-Box Adversarial Attack With Customized Iteration and Sampling
    Shi, Yucheng
    Han, Yahong
    Hu, Qinghua
    Yang, Yi
    Tian, Qi
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (02) : 2226 - 2245
  • [20] Source Code Authorship Identification Using Deep Neural Networks
    Kurtukova, Anna
    Romanov, Aleksandr
    Shelupanov, Alexander
    SYMMETRY-BASEL, 2020, 12 (12): : 1 - 17