Assessing the Effects of Lemmatisation and Spell Checking on Sentiment Analysis of Online Reviews

被引:1
作者
Kavanagh, James [1 ]
Greenhow, Keith [1 ]
Jordanous, Anna [1 ]
机构
[1] Univ Kent, Sch Comp, Canterbury, Kent, England
来源
2023 IEEE 17TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, ICSC | 2023年
关键词
Natural Language Processing; Language parsing and understanding; Text analysis; Web text analysis; Sentiment analysis;
D O I
10.1109/ICSC56153.2023.00046
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With many text preprocessing options, choosing the most efficient pipeline is important for accuracy and computational expense. Online text often contains non-standard English, spelling errors, colloquialisms, emojis, slang and other variations that affect current natural language processing tools, with no clear guidelines for preprocessing this type of text. In this work we analyse text preprocessing techniques using a dataset of online reviews scraped from iTunes and Google Play store. The objective is to measure the efficacy of different combinations of these techniques to maximise the amount of detected sentiment in a dataset of 438,157 reviews. Sentiment detection was performed by two state-of-the-art sentiment analysers (RoBERTa and VADER). Statistical analysis of the results suggest preprocessing strategies for maximising sentiment detected within mental health app reviews and similar text formats.
引用
收藏
页码:235 / 238
页数:4
相关论文
共 19 条
  • [1] Androutsopoulos J, 2014, LINGUAE LITT, V36, P3
  • [2] Bergmanis T, 2018, P 2018 C N AM CHAPT, V1, P1391, DOI [DOI 10.18653/V1/N18-1126, 10.18653/v1/N18-1126]
  • [3] Bird, 2006, P COLING ACL INT PRE, P69
  • [4] Bittlingmayer AB., 2019, AMAZON REV SENTIMENT
  • [5] CHAI CY, 2022, NAT LANG ENG, P1, DOI DOI 10.3897/MYCOKEYS.90.83829
  • [6] Text normalization in social media: progress, problems and applications for a pre-processing system of casual English
    Clark, Eleanor
    Araki, Kenji
    [J]. COMPUTATIONAL LINGUISTICS AND RELATED FIELDS, 2011, 27 : 2 - 11
  • [7] The role of text pre-processing in opinion mining on a social media language dataset
    dos Santos, Fernando Leandro
    Ladeira, Marcelo
    [J]. 2014 BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2014, : 50 - 54
  • [8] Foster J., 2011, AAAI WORKSH
  • [9] Studying the Effects of Text Preprocessing and Ensemble Methods on Sentiment Analysis of Brazilian Portuguese Tweets
    Gomes, Fernando Barbosa
    Adan-Coello, Juan Manuel
    Kintschner, Fernando Ernesto
    [J]. STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2018, 2018, 11171 : 167 - 177
  • [10] Hutto C., 2014, P INT AAAI C WEB SOC, P216