Achieving Reliable Sentiment Analysis in the Software Engineering Domain using BERT

被引:38
作者
Biswas, Eeshita [1 ]
Karabulut, Mehmet Efruz [1 ]
Pollock, Lori [1 ]
Vijay-Shanker, K. [1 ]
机构
[1] Univ Delaware, Comp & Informat Sci, Newark, DE 19716 USA
来源
2020 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2020) | 2020年
关键词
Sentiment Analysis; Software Engineering; BERT; ANALYSIS TOOLS; IMPROVE;
D O I
10.1109/ICSME46990.2020.00025
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Researchers have shown that sentiment analysis of software artifacts can potentially improve various software engineering tools, including API and library recommendation systems, code suggestion tools, and tools for improving communication among software developers. However, sentiment analysis techniques applied to software artifacts still have not yet yielded very high accuracy. Recent adaptations of sentiment analysis tools to the software domain have reported some improvements, but the f-measures for the positive and negative sentences still remain in the 0.4-0.64 range, which deters their practical usefulness for software engineering tools. In this paper, we explore the potential effectiveness of customizing BERT, a language representation model, which has recently achieved very good results on various Natural Language Processing tasks on English texts, for the task of sentiment analysis of software artifacts. We describe our application of BERT to analyzing sentiments of sentences in Stack Overflow posts and compare the impact of a BERT sentiment classifier to state-of-the-art sentiment analysis techniques when used on a domain-specific data set created from Stack Overflow posts. We also investigate how the performance of sentiment analysis changes when using a much (3 times) larger data set than previous studies. Our results show that the BERT classifier achieves reliable performance for sentiment analysis of software engineering texts. BERT combined with the larger data set achieves an overall f-measure of 0.87, with the f-measures for the negative and positive sentences reaching 0.91 and 0.78 respectively, a significant improvement over the state-of-the-art.
引用
收藏
页码:162 / 173
页数:12
相关论文
共 41 条
  • [1] Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
  • [2] Adhikari A., 2019, ARXIV PREPRINT ARXIV
  • [3] Ahmed T, 2017, IEEE INT CONF AUTOM, P106, DOI 10.1109/ASE.2017.8115623
  • [4] Biswas Eeshita, 2019, 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), P68, DOI 10.1109/MSR.2019.00020
  • [5] Bujang MA, 2017, ARCH OROFAC SCI, V12, P1
  • [6] Sentiment Polarity Detection for Software Development
    Calefato, Fabio
    Lanubile, Filippo
    Maiorano, Federico
    Novielli, Nicole
    [J]. EMPIRICAL SOFTWARE ENGINEERING, 2018, 23 (03) : 1352 - 1382
  • [7] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [8] SEntiMoji: An Emoji-Powered Learning Approach for Sentiment Analysis in Software Engineering
    Chen, Zhenpeng
    Cao, Yanbin
    Lu, Xuan
    Mei, Qiaozhu
    Liu, Xuanzhe
    [J]. ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, : 841 - 852
  • [9] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [10] Ding J, 2018, 2018 IEEE/ACM 3RD INTERNATIONAL WORKSHOP ON EMOTION AWARENESS IN SOFTWARE ENGINEERING (SEMOTION), P7, DOI [10.1145/3194932.3194935, 10.1109/GLOCOM.2018.8647613]