Overview of the PAN@FIRE 2020 Task on the Authorship Identification of SOurce COde

被引：0

作者：

Fadel, Ali ^{[1
]}

Musleh, Husam ^{[1
]}

Tuffaha, Ibraheem ^{[1
]}

Al-Ayyoub, Mahmoud ^{[1
]}

Jararweh, Yaser ^{[2
]}

Benkhelifa, Elhadj ^{[3
]}

机构：

[1] Jordan Univ Sci & Technol, Irbid, Jordan

[2] Duquesne Univ, Pittsburgh, PA 15219 USA

[3] Staffordshire Univ, Stoke On Trent, Staffs, England

来源：

PROCEEDINGS OF THE 12TH ANNUAL MEETING OF THE FORUM FOR INFORMATION RETRIEVAL EVALUATION (FIRE 2020) | 2020年

关键词：

authorship-identification; source-code; datasets;

D O I：

10.1145/3441501.3441532

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Authorship identification is essential to the detection of undesirable deception of others' content misuse or exposing the owners of some anonymous malicious content. While it is widely studied for natural languages, it is rarely considered for programming languages. Accordingly, a PAN@FIRE task, named Authorship Identification of SOurce COde (AI-SOCO), is proposed with the focus on the identification of source code authors. The dataset consists of crawled source codes submitted by the top 1,000 human users with 100 correct C++ submissions or more from the CodeForces online judge platform. The participating systems are asked to predict the author of a given source code from the predefined list of code authors. In total, 60 teams registered on the task's CodaLab page. Out of them, 14 teams submitted 94 runs. The results are surprisingly high with many teams and baselines breaking the 90% accuracy barrier. These systems used a wide range of models and techniques from pretrained word embeddings (especially, those that are tweaked to handle source code) to stylometric features.

引用

页码：4 / 8

页数：5

共 11 条

[1] Performance Analysis of Google Colaboratory as a Tool for Accelerating Deep Learning Applications
Carneiro, Tiago
Medeiros Da Nobrega, Raul Victor
Nepomuceno, Thiago
Bian, Gui-Bin
De Albuquerque, Victor Hugo C.
Reboucas Filho, Pedro Pedrosa
[J]. IEEE ACCESS, 2018, 6 : 61677 - 61685
[2] Crosby Alexander, 2020, P 12 M FORUM INFORM
[3] Flores Enrique, 2015, POSTPROCEEDINGS WORK, V1587, P1
[4] Guo GD, 2003, LECT NOTES COMPUT SC, V2888, P986
[5] Liu YH, 2019, Arxiv, DOI [arXiv:1907.11692, 10.48550/ARXIV.1907.11692, DOI 10.48550/ARXIV.1907.11692]
[6] An introduction to logistic regression analysis and reporting
Peng, CYJ
Lee, KL
Ingersoll, GM
[J]. JOURNAL OF EDUCATIONAL RESEARCH, 2002, 96 (01) : 3 - 14
[7] Radford A., 2019, OpenAI Blog, V1, P9, DOI DOI 10.18653/V1/P19-1195
[8] Rangel Francisco, 2016, WORKING NOTES FIRE 2, V1737, P1
[9] TERM-WEIGHTING APPROACHES IN AUTOMATIC TEXT RETRIEVAL
SALTON, G
BUCKLEY, C
[J]. INFORMATION PROCESSING & MANAGEMENT, 1988, 24 (05) : 513 - 523
[10] Vaswani A, 2017, ADV NEUR IN, V30

← 1 2 →