Achieving Reliable Sentiment Analysis in the Software Engineering Domain using BERT

被引：43

作者：

Biswas, Eeshita ^{[1
]}

Karabulut, Mehmet Efruz ^{[1
]}

Pollock, Lori ^{[1
]}

Vijay-Shanker, K. ^{[1
]}

机构：

[1] Univ Delaware, Comp & Informat Sci, Newark, DE 19716 USA

来源：

2020 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2020) | 2020年

关键词：

Sentiment Analysis; Software Engineering; BERT; ANALYSIS TOOLS; IMPROVE;

D O I：

10.1109/ICSME46990.2020.00025

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Researchers have shown that sentiment analysis of software artifacts can potentially improve various software engineering tools, including API and library recommendation systems, code suggestion tools, and tools for improving communication among software developers. However, sentiment analysis techniques applied to software artifacts still have not yet yielded very high accuracy. Recent adaptations of sentiment analysis tools to the software domain have reported some improvements, but the f-measures for the positive and negative sentences still remain in the 0.4-0.64 range, which deters their practical usefulness for software engineering tools. In this paper, we explore the potential effectiveness of customizing BERT, a language representation model, which has recently achieved very good results on various Natural Language Processing tasks on English texts, for the task of sentiment analysis of software artifacts. We describe our application of BERT to analyzing sentiments of sentences in Stack Overflow posts and compare the impact of a BERT sentiment classifier to state-of-the-art sentiment analysis techniques when used on a domain-specific data set created from Stack Overflow posts. We also investigate how the performance of sentiment analysis changes when using a much (3 times) larger data set than previous studies. Our results show that the BERT classifier achieves reliable performance for sentiment analysis of software engineering texts. BERT combined with the larger data set achieves an overall f-measure of 0.87, with the f-measures for the negative and positive sentences reaching 0.91 and 0.78 respectively, a significant improvement over the state-of-the-art.

引用

页码：162 / 173

页数：12

共 41 条

[1]

Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265

[2]

Adhikari A., 2019, ARXIV PREPRINT ARXIV

[3]

Ahmed T, 2017, IEEE INT CONF AUTOM, P106, DOI 10.1109/ASE.2017.8115623

[4]

[Anonymous], 2015, ACS SYM SER

[5]

Biswas Eeshita, 2019, 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), P68, DOI 10.1109/MSR.2019.00020

[6]

Bujang MA, 2017, ARCH OROFAC SCI, V12, P1

[7] Sentiment Polarity Detection for Software Development [J].

Calefato, Fabio ;

Lanubile, Filippo ;

Maiorano, Federico ;

Novielli, Nicole .

EMPIRICAL SOFTWARE ENGINEERING, 2018, 23 (03) :1352-1382

[8] SMOTE: Synthetic minority over-sampling technique [J].

Chawla, Nitesh V. ;

Bowyer, Kevin W. ;

Hall, Lawrence O. ;

Kegelmeyer, W. Philip .

2002, American Association for Artificial Intelligence (16)

[9] SEntiMoji: An Emoji-Powered Learning Approach for Sentiment Analysis in Software Engineering [J].

Chen, Zhenpeng ;

Cao, Yanbin ;

Lu, Xuan ;

Mei, Qiaozhu ;

Liu, Xuanzhe .

ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, :841-852

[10]

Corrado G., 2013, WORKSH P INT C LEARN, V1301, P3781

← 1 2 3 4 5 →