Sentiment Polarity Detection for Software Development

被引:1
作者
Fabio Calefato
Filippo Lanubile
Federico Maiorano
Nicole Novielli
机构
[1] University of Bari “A. Moro”,Dipartimento Jonico
[2] University of Bari “A. Moro”,Dipartimento di Informatica
来源
Empirical Software Engineering | 2018年 / 23卷
关键词
Sentiment Analysis; Communication Channels; Stack Overflow; Word Embedding; Social Software Engineering;
D O I
暂无
中图分类号
学科分类号
摘要
The role of sentiment analysis is increasingly emerging to study software developers’ emotions by mining crowd-generated content within social software engineering tools. However, off-the-shelf sentiment analysis tools have been trained on non-technical domains and general-purpose social media, thus resulting in misclassifications of technical jargon and problem reports. Here, we present Senti4SD, a classifier specifically trained to support sentiment analysis in developers’ communication channels. Senti4SD is trained and validated using a gold standard of Stack Overflow questions, answers, and comments manually annotated for sentiment polarity. It exploits a suite of both lexicon- and keyword-based features, as well as semantic features based on word embedding. With respect to a mainstream off-the-shelf tool, which we use as a baseline, Senti4SD reduces the misclassifications of neutral and positive posts as emotionally negative. To encourage replications, we release a lab package including the classifier, the word embedding space, and the gold standard with annotation guidelines.
引用
收藏
页码:1352 / 1382
页数:30
相关论文
共 53 条
  • [1] Barua A(2014)What are developers talking about? an analysis of topics and trends in stack over- flow Empir Softw Eng 19 619-654
  • [2] Thomas SW(2003)A neural probabilistic language model J Mach Learn Res 3 1137-1155
  • [3] Hassan AE(2013)Cross-Domain Sentiment Classification Using a Sentiment Sensitive Thesaurus IEEE Transactions on Knowledge and Data Engineering 25 1719-1731
  • [4] Bengio Y(2007)Recovering traceability links in software artifact management systems using information retrieval methods ACM Transactions on Software Engineering and Methodology 16 13-es
  • [5] Ducharme R(2012)Moods Communications of the ACM 55 33-1874
  • [6] Vincent P(2008)Liblinear: A library for large linear classification J Mach Learn Res 9 1871-1284
  • [7] Janvin C(2009)Learning from Imbalanced Data IEEE Trans Knowl Data Eng 21 1263-77
  • [8] Bollegala Danushka(2015)Using rhetorical structure in sentiment analysis Commun ACM 58 69-240
  • [9] Weir David(1997)A solution to Platos problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge Psychol Rev 104 211-331
  • [10] Carroll John(2016)On the automatic classification of app reviews Requir Eng 21 311-28