Engineers, Aware! Commercial Tools Disagree on Social Media Sentiment: Analyzing the Sentiment Bias of Four Major Tools

被引:2
作者
Jung S.-G. [1 ]
Salminen J. [2 ]
Jansen B.J. [1 ]
机构
[1] Hamad Bin Khalifa University, Doha
[2] University of Vaasa, Vaasa
来源
Proceedings of the ACM on Human-Computer Interaction | 2022年 / 6卷 / EICS期
关键词
agreement; bias; evaluation; sentiment analysis;
D O I
10.1145/3532203
中图分类号
学科分类号
摘要
Large commercial sentiment analysis tools are often deployed in software engineering due to their ease of use. However, it is not known how accurate these tools are, and whether the sentiment ratings given by one tool agree with those given by another tool. We use two datasets-(1) NEWS consisting of 5,880 news stories and 60K comments from four social media platforms: Twitter, Instagram, YouTube, and Facebook; and (2) IMDB consisting of 7,500 positive and 7,500 negative movie reviews-to investigate the agreement and bias of four widely used sentiment analysis (SA) tools: Microsoft Azure (MS), IBM Watson, Google Cloud, and Amazon Web Services (AWS). We find that the four tools assign the same sentiment on less than half (48.1%) of the analyzed content. We also find that AWS exhibits neutrality bias in both datasets, Google exhibits bi-polarity bias in the NEWS dataset but neutrality bias in the IMDB dataset, and IBM and MS exhibit no clear bias in the NEWS dataset but have bi-polarity bias in the IMDB dataset. Overall, IBM has the highest accuracy relative to the known ground truth in the IMDB dataset. Findings indicate that psycholinguistic features-especially affect, tone, and use of adjectives-explain why the tools disagree. Engineers are urged caution when implementing SA tools for applications, as the tool selection affects the obtained sentiment labels. © 2022 ACM.
引用
收藏
相关论文
共 57 条
[1]  
Abdar M., Pourpanah F., Hussain S., Rezazadegan D., Liu L., Ghavamzadeh M., Fieguth P., Cao X., Khosravi A., Rajendra Acharya U., A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges, (2021)
[2]  
Abdul-Mageed M., Diab M.T., AWATIF: A Multi-Genre Corpus for Modern Standard Arabic Subjectivity and Sentiment Analysis, LREC, 515, pp. 3907-3914, (2012)
[3]  
Khalil Aldous K., An J., Jansen B.J., View, Like, Comment, Post: Analyzing User Engagement by Topic at 4 Levels Across 5 Social Media Platforms for 53 News Organizations, Proceedings of the International AAAI Conference on Web and Social Media, (2019)
[4]  
Alessia D., Ferri F., Grifoni P., Guzzo T., 2015. Approaches, tools and applications for sentiment analysis implementation, International Journal of Computer Applications, 125, (2015)
[5]  
Alonso O., Marshall C.C., Najork M., 2015. Debugging a Crowdsourced Task with Low Inter-Rater Agreement, Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries 2015, pp. 101-110
[6]  
Basile V., Nissim M., Sentiment analysis on Italian tweets, Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 100-107, (2013)
[7]  
Bautin M., Vijayarenu L., Skiena S., International sentiment analysis for news and blogs, Proceedings of the International AAAI Conference on Web and Social Media, (2008)
[8]  
Bermingham A., Smeaton A.F., Classifying sentiment in microblogs: Is brevity an advantage?, Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 1833-1836, (2010)
[9]  
Bolukbasi T., Chang K., Zou J.Y., Saligrama V., Kalai A.T., Man is to computer programmer as woman is to homemaker? debiasing word embeddings, Advances in Neural Information Processing Systems, pp. 4349-4357, (2016)
[10]  
Buccoliero L., Bellio E., Crestini G., Arkoudas A., 2020. Twitter and politics: Evidence from the US presidential elections 2016, Journal of Marketing Communications, 26, 1, pp. 88-114, (2020)