GitHub Issue Classification Using BERT-Style Models

被引:14
作者
Bharadwaj, Shikhar [1 ]
Kadam, Tushar [1 ]
机构
[1] Indian Inst Sci, Bengaluru, Karnataka, India
来源
2022 IEEE/ACM 1ST INTERNATIONAL WORKSHOP ON NATURAL LANGUAGE-BASED SOFTWARE ENGINEERING (NLBSE 2022) | 2022年
关键词
NLP; BERT; text classification;
D O I
10.1145/3528588.3528663
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent innovations in natural language processing techniques have led to the development of various tools for assisting software developers. This paper provides a report of our proposed solution to the issue report classification task from the NL-Based Software Engineering workshop. We approach the task of classifying issues on GitHub repositories using BERT-style models [1, 2, 6, 8]. We propose a neural architecture for the problem that utilizes contextual embeddings for the text content in the GitHub issues. Besides, we design additional features for the classification task. We perform a thorough ablation analysis of the designed features and benchmark various BERT-style models for generating textual embeddings. Our proposed solution performs better than the competition organizer's method and achieves an F-1 score of 0.8653. Our code and trained models are available at https://github.com/Kadam-Tushar/Issue-Classifier.
引用
收藏
页码:40 / 43
页数:4
相关论文
共 8 条
[1]  
Devlin J, 2019, Arxiv, DOI [arXiv:1810.04805, 10.48550/arXiv.1810.04805]
[2]  
Feng ZY, 2020, FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, P1536
[3]   Predicting issue types on GitHub [J].
Kallis, Rafael ;
Di Sorbo, Andrea ;
Canfora, Gerardo ;
Panichella, Sebastiano .
SCIENCE OF COMPUTER PROGRAMMING, 2021, 205
[4]   Ticket Tagger: Machine Learning Driven Issue Classification [J].
Kallis, Rafael ;
Di Sorbo, Andrea ;
Canfora, Gerardo ;
Panichella, Sebastiano .
2019 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2019), 2019, :406-409
[5]  
Kallis Rafael, 2022, P 1 INT WORKSHOP NAT
[6]  
Liu YH, 2019, Arxiv, DOI arXiv:1907.11692
[7]  
van der Maaten L, 2008, J MACH LEARN RES, V9, P2579
[8]  
Yang ZL, 2019, ADV NEUR IN, V32