Deep Metric Learning for Code Authorship Attribution and Verification

被引:2
作者
White, Riley [1 ]
Sprague, Nathan [1 ]
机构
[1] James Madison Univ, Dept Comp Sci, Harrisonburg, VA 22807 USA
来源
20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021) | 2021年
关键词
Authorship identification; authorship verification; metric learning; stylometry; deep learning; malware recognition;
D O I
10.1109/ICMLA52953.2021.00178
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Code authorship identification can assist in identifying creators of malware, identifying plagiarism, and giving insights in copyright infringement cases. Taking inspiration from facial recognition work, we apply recent advances in metric learning to the problem of authorship identification and verification. The metric learning approach makes it possible to measure similarity in the learned embedding space. Access to a discriminative similarity measure allows for the estimation of probability distributions that facilitate open-set classification and verification. We extend our analysis to verification based on sets of files, a previously unexplored problem domain in large-scale author identification. On closed-set tasks we achieve competitive accuracies, but do not improve on the state of the art.
引用
收藏
页码:1089 / 1093
页数:5
相关论文
共 18 条
  • [1] Code authorship identification using convolutional neural networks
    Abuhamad, Mohammed
    Rhim, Ji-su
    AbuHmed, Tamer
    Ullah, Sana
    Kang, Sanggil
    Nyang, DaeHun
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 95 : 104 - 115
  • [2] Large-Scale and Language-Oblivious Code Authorship Identification
    Abuhamad, Mohammed
    AbuHmed, Tamer
    Mohaisen, Aziz
    Nyang, DaeHun
    [J]. PROCEEDINGS OF THE 2018 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY (CCS'18), 2018, : 101 - 114
  • [3] [Anonymous], 2009, P SIAM DATA MINING
  • [4] Caliskan-Islam A, 2015, PROCEEDINGS OF THE 24TH USENIX SECURITY SYMPOSIUM, P255
  • [5] Chen T, 2020, PR MACH LEARN RES, V119
  • [6] Learning a similarity metric discriminatively, with application to face verification
    Chopra, S
    Hadsell, R
    LeCun, Y
    [J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 539 - 546
  • [7] Deng J., 2018, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2019.00482
  • [8] Deep Metric Learning Using Triplet Network
    Hoffer, Elad
    Ailon, Nir
    [J]. SIMILARITY-BASED PATTERN RECOGNITION, SIMBAD 2015, 2015, 9370 : 84 - 92
  • [9] Significance of Softmax-Based Features in Comparison to Distance Metric Learning-Based Features
    Horiguchi, Shota
    Ikami, Daiki
    Aizawa, Kiyoharu
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (05) : 1279 - 1285
  • [10] Code Authorship Attribution: Methods and Challenges
    Kalgutkar, Vaibhavi
    Kaur, Ratinder
    Gonzalez, Hugo
    Stakhanova, Natalia
    Matyukhina, Alina
    [J]. ACM COMPUTING SURVEYS, 2019, 52 (01)