Adversarial Authorship Attribution in Open-Source Projects

被引:6
作者
Matyukhina, Alina [1 ]
Stakhanova, Natalia [2 ]
Dalla Preda, Mila [3 ]
Perley, Celine [1 ]
机构
[1] Univ New Brunswick, Canadian Inst Cybersecur, Fredericton, NB, Canada
[2] Univ Saskatchewan, Dept Comp Sci, Saskatoon, SK, Canada
[3] Univ Verona, Dipartimento Informat, Verona, Italy
来源
PROCEEDINGS OF THE NINTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY (CODASPY '19) | 2019年
关键词
Authorship attribution; obfuscation; imitation; open-source software; adversarial; attacks;
D O I
10.1145/3292006.3300032
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Open-source software is open to anyone by design, whether it is a community of developers, hackers or malicious users. Authors of open-source software typically hide their identity through nicknames and avatars. However, they have no protection against authorship attribution techniques that are able to create software author profiles just by analyzing software characteristics. In this paper we present an author imitation attack that allows to deceive current authorship attribution systems and mimic a coding style of a target developer. Withing this context we explore the potential of the existing attribution techniques to be deceived. Our results show that we are able to imitate the coding style of the developers based on the data collected from the popular source code repository, GitHub. To subvert author imitation attack, we propose a novel author obfuscation approach that allows us to hide the coding style of the author. Unlike existing obfuscation tools, this new obfuscation technique uses transformations that preserve code readability. We assess the effectiveness of our attacks on several datasets produced by actual developers from GitHub, and participants of the GoogleCodeJam competition. Throughout our experiments we show that the author hiding can be achieved by making sensible transformations which significantly reduce the likelihood of identifying the author's style to 0% by current authorship attribution systems.
引用
收藏
页码:291 / 302
页数:12
相关论文
共 29 条
[1]   OBA2: An Onion approach to Binary code Authorship Attribution [J].
Alrabaee, Saed ;
Saleem, Noman ;
Preda, Stere ;
Wang, Lingyu ;
Debbabi, Mourad .
DIGITAL INVESTIGATION, 2014, 11 :S94-S103
[2]   Source Code Authorship Attribution Using Long Short-Term Memory Based Networks [J].
Alsulami, Bander ;
Dauber, Edwin ;
Harang, Richard ;
Mancoridis, Spiros ;
Greenstadt, Rachel .
COMPUTER SECURITY - ESORICS 2017, PT I, 2018, 10492 :65-82
[3]  
[Anonymous], 2009, ACM SIGKDD explorations newsletter, DOI 10.1145/1656274.1656278
[4]  
[Anonymous], 25 ANN NETW DISTR SY
[5]  
[Anonymous], 2015, 24 USENIX SEC S USEN
[6]  
Baayen H., 1996, Literary & Linguistic Computing, V11, P121, DOI 10.1093/llc/11.3.121
[7]  
Biggio B., 2013, MACHINE LEARNING KNO, P387, DOI [DOI 10.1007/978-3-642-40994, DOI 10.1007/978-3-642-40994-3_25]
[8]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[9]   Comparing techniques for authorship attribution of source code [J].
Burrows, Steven ;
Uitdenbogerd, Alexandra L. ;
Turpin, Andrew .
SOFTWARE-PRACTICE & EXPERIENCE, 2014, 44 (01) :1-32
[10]  
Burrows S, 2009, LECT NOTES COMPUT SC, V5463, P699, DOI 10.1007/978-3-642-00887-0_61