A Light Bug Triage Framework for Applying Large Pre-trained Language Model

被引：22

作者：

Lee, Jaehyung ^{[1
]}

Han, Kisun ^{[2
]}

Yu, Hwanjo ^{[1
]}

机构：

[1] Pohang Univ Sci & Technol POSTECH, Pohang, Gyeongsangbuk D, South Korea

[2] Samsung Res, Seoul, South Korea

来源：

PROCEEDINGS OF THE 37TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE 2022 | 2022年

关键词：

Bug triage; Pre-trained language model; BERT; Knowledge distillation; Catastrophic forgetting; Overthinking;

D O I：

10.1145/3551349.3556898

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Assigning appropriate developers to the bugs is one of the main challenges in bug triage. Demands for automatic bug triage are increasing in the industry, as manual bug triage is labor-intensive and time-consuming in large projects. The key to the bug triage task is extracting semantic information from a bug report. In recent years, large Pre-trained Language Models (PLMs) including BERT [4] have achieved dramatic progress in the natural language processing (NLP) domain. However, applying large PLMs to the bug triage task for extracting semantic information has several challenges. In this paper, we address the challenges and propose a novel framework for bug triage named LBT-P, standing for Light Bug Triage framework with a Pre-trained language model. It compresses a large PLM into small and fast models using knowledge distillation techniques and also prevents catastrophic forgetting of PLM by introducing knowledge preservation fine-tuning. We also develop a new loss function exploiting representations of earlier layers as well as deeper layers in order to handle the overthinking problem. We demonstrate our proposed framework on the real-world private dataset and three public real-world datasets [11]: Google Chromium, Mozilla Core, and Mozilla Firefox. The result of the experiments shows the superiority of LBT-P.

引用

页数：11

共 25 条

[1] Stay Professional and Efficient: Automatically Generate Titles for Your Bug Reports [J].

Chen, Songqiang ;

Xie, Xiaoyuan ;

Yin, Bangguo ;

Ji, Yuanxiang ;

Chen, Lin ;

Xu, Baowen .

2020 35TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE 2020), 2020, :385-397

[2]

Clark K., 2020, 8 INT C LEARNING REP, DOI [DOI 10.48550/ARXIV.2003.10555, 10.48550/arXiv.2003.10555]

[3] Automated Bug Triaging in an Industrial Context [J].

Dedik, Vaclav ;

Rossi, Bruno .

2016 42ND EUROMICRO CONFERENCE ON SOFTWARE ENGINEERING AND ADVANCED APPLICATIONS (SEAA), 2016, :363-367

[4]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[5]

Hinton G, 2015, Arxiv, DOI arXiv:1503.02531

[6] Automated bug assignment: Ensemble-based machine learning in large scale industrial contexts [J].

Jonsson, Leif ;

Borg, Markus ;

Broman, David ;

Sandahl, Kristian ;

Eldh, Sigrid ;

Runeson, Per .

EMPIRICAL SOFTWARE ENGINEERING, 2016, 21 (04) :1533-1578

[7]

Kaya Y, 2019, PR MACH LEARN RES, V97

[8]

Kingma DP, 2014, ADV NEUR IN, V27

[9]

Lan ZZ, 2020, Arxiv, DOI [arXiv:1909.11942, DOI 10.48550/ARXIV.1909.11942]

[10]

Liu YH, 2019, Arxiv, DOI arXiv:1907.11692

← 1 2 3 →