Assessing the Effects of Lemmatisation and Spell Checking on Sentiment Analysis of Online Reviews

被引:1
作者
Kavanagh, James [1 ]
Greenhow, Keith [1 ]
Jordanous, Anna [1 ]
机构
[1] Univ Kent, Sch Comp, Canterbury, Kent, England
来源
2023 IEEE 17TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, ICSC | 2023年
关键词
Natural Language Processing; Language parsing and understanding; Text analysis; Web text analysis; Sentiment analysis;
D O I
10.1109/ICSC56153.2023.00046
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With many text preprocessing options, choosing the most efficient pipeline is important for accuracy and computational expense. Online text often contains non-standard English, spelling errors, colloquialisms, emojis, slang and other variations that affect current natural language processing tools, with no clear guidelines for preprocessing this type of text. In this work we analyse text preprocessing techniques using a dataset of online reviews scraped from iTunes and Google Play store. The objective is to measure the efficacy of different combinations of these techniques to maximise the amount of detected sentiment in a dataset of 438,157 reviews. Sentiment detection was performed by two state-of-the-art sentiment analysers (RoBERTa and VADER). Statistical analysis of the results suggest preprocessing strategies for maximising sentiment detected within mental health app reviews and similar text formats.
引用
收藏
页码:235 / 238
页数:4
相关论文
共 19 条
  • [11] Kulkarni R., 2019, URBAN DICT WORDS DEF
  • [12] Lim E., 2020, APP STORE SCRAPER
  • [13] Liu B., 2022, SENTIMENT ANAL MININ
  • [14] Liu Y., 2019, ROBERTA ROBUSTLY OPT
  • [15] Mingyu J, 2019, Google play scraper for python
  • [16] Samad M. D., 2020, EFFECT TEXT PROCESSI
  • [17] Shahnawaz, 2017, 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND AUTOMATION (ICCCA), P154, DOI 10.1109/CCAA.2017.8229791
  • [18] Shoeb A. A. M., 2021, ASSESSING EMOJI USE
  • [19] Tolegenova A., 2022, B NATURAL TECHNICAL, V58, P15