Can Large Language Models Comprehend Code Stylometry?

被引:0
作者
Dipongkor, Atish Kumar [1 ]
机构
[1] Univ Cent Florida, Orlando, FL 32816 USA
来源
PROCEEDINGS OF 2024 39TH ACM/IEEE INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE 2024 | 2024年
关键词
D O I
10.1145/3691620.3695370
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Code Authorship Attribution (CAA) has several applications such as copyright disputes, plagiarism detection and criminal prosecution. Existing studies mainly focused on CAA by proposing machine learning (ML) and Deep Learning (DL) based techniques. The main limitations of ML-based techniques are (a) manual feature engineering is required to train these models and (b) they are vulnerable to adversarial attack. In this study, we initially fine-tune five Large Language Models (LLMs) for CAA and evaluate their performance. Our results show that LLMs are robust and less vulnerable compared to existing techniques in CAA task.
引用
收藏
页码:2429 / 2431
页数:3
相关论文
共 26 条
[1]   Large-scale and Robust Code Authorship Identification with Deep Feature Learning [J].
Abuhamad, Mohammed ;
Abuhmed, Tamer ;
Mohaisen, David ;
Nyang, Daehun .
ACM TRANSACTIONS ON PRIVACY AND SECURITY, 2021, 24 (04)
[2]   Code authorship identification using convolutional neural networks [J].
Abuhamad, Mohammed ;
Rhim, Ji-su ;
AbuHmed, Tamer ;
Ullah, Sana ;
Kang, Sanggil ;
Nyang, DaeHun .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 95 :104-115
[3]   Large-Scale and Language-Oblivious Code Authorship Identification [J].
Abuhamad, Mohammed ;
AbuHmed, Tamer ;
Mohaisen, Aziz ;
Nyang, DaeHun .
PROCEEDINGS OF THE 2018 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY (CCS'18), 2018, :101-114
[4]   Authorship Attribution of Source Code: A Language-Agnostic Approach and Applicability in Software Engineering [J].
Bogomolov, Egor ;
Kovalenko, Vladimir ;
Rebryk, Yurii ;
Bacchelli, Alberto ;
Bryksin, Timofey .
PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21), 2021, :932-944
[5]  
Burrows S., 2007, P TWELTH AUSTRALASIA, P32
[6]   Comparing techniques for authorship attribution of source code [J].
Burrows, Steven ;
Uitdenbogerd, Alexandra L. ;
Turpin, Andrew .
SOFTWARE-PRACTICE & EXPERIENCE, 2014, 44 (01) :1-32
[7]  
Caliskan-Islam A, 2015, PROCEEDINGS OF THE 24TH USENIX SECURITY SYMPOSIUM, P255
[8]   An Extensive Study on Adversarial Attack against Pre-trained Models of Code [J].
Du, Xiaohu ;
Wen, Ming ;
Wei, Zichao ;
Wang, Shangwen ;
Jin, Hai .
PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023, 2023, :489-501
[9]  
Feng ZY, 2020, Arxiv, DOI arXiv:2002.08155
[10]  
Frantzeskou G., 2006, 28th International Conference on Software Engineering Proceedings, P893, DOI 10.1145/1134285.1134445