Sentiment Analysis of Noisy Malay Text: State of Art, Challenges and Future Work

被引:8
作者
Abu Bakar, Muhammad Fakhrur Razi [1 ]
Idris, Norisma [1 ]
Shuib, Liyana [2 ]
Khamis, Norazlina [3 ]
机构
[1] Univ Malaya, Fac Comp Sci & IT, Dept Artificial Intelligence, Kuala Lumpur 50603, Malaysia
[2] Univ Malaya, Fac Comp Sci & IT, Dept Informat Syst, Kuala Lumpur 50603, Malaysia
[3] Univ Malaysia Sabah, Fac Comp & Informat, Language Engn & Applicat Dev Res Grp, Kota Kinabalu 88400, Sabah, Malaysia
关键词
Hybrid; lexicon-based; machine learning; noisy Malay text; sentiment analysis;
D O I
10.1109/ACCESS.2020.2968955
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sentiment analysis (SA) is a study where people & x2019;s opinions and emotions are automatically extracted in the form of sentiments from the natural language text. In social media monitoring, it is very useful because it allows user to gain an overall picture of the extensive public opinion behind many topics. Most works on SA are for the English text. Only a few works focus on the Malay language. Currently, a review on SA for the Malay language only focus on the SA approaches and the dataset. Some major issues such as the pre-processing techniques used to normalize the noisy text, the most employed performance measures for Malay SA, and the challenges for Malay SA has not been reviewed. Malaysians tend not to fully follow any abbreviations rules when writing on social media. Thus, a lot of noisy text can be found in social media sites like Facebook and Twitter which create some issues to SA process. Hence, the aim of this study is to investigate the state of the art, challenges and future works of SA for Malay social media text. This study provides a review on various approaches, datasets, performance measures, and pre-processing techniques used in the previous works on SA of the Malay text. More than 700 articles from journals and conference proceedings have been identified using the search keywords, however, only 17 relevant articles published from year 2013 to 2018 were reviewed. The findings from this review focus on three commonly used SA approaches which are lexicon-based, machine learning, and hybrid.
引用
收藏
页码:24687 / 24696
页数:10
相关论文
共 27 条
  • [1] Al-Moslmi T., 2015, PROC 1 ICRIL INT C I, P1
  • [2] Malay sentiment analysis based on combined classification approaches and Senti-lexicon algorithm
    Al-Saffar, Ahmed
    Awang, Suryanti
    Tao, Hai
    Omar, Nazlia
    Al-Saiagh, Wafaa
    Al-bared, Mohammed
    [J]. PLOS ONE, 2018, 13 (04):
  • [3] Alsaffar A, 2014, I C INF TECH MULTIM, P270, DOI 10.1109/ICIMU.2014.7066643
  • [4] [Anonymous], 2013, PROC WORLD C ENG
  • [5] [Anonymous], 2016, P 4 INT C ART INT CO
  • [6] [Anonymous], P 6 INT C EL ENG INF
  • [7] [Anonymous], 2015, J COMPUT SCI-NETH
  • [8] Creating sentiment lexicon for sentiment analysis in Urdu: The case of a resource-poor language
    Asghar, Muhammad Zubair
    Sattar, Anum
    Khan, Aurangzeb
    Ali, Amjad
    Kundi, Fazal Masud
    Ahmad, Shakeel
    [J]. EXPERT SYSTEMS, 2019, 36 (03)
  • [9] Sentiment Analysis of Malay Social Media Text
    Chekima, Khalifa
    Alfred, Rayner
    [J]. COMPUTATIONAL SCIENCE AND TECHNOLOGY, ICCST 2017, 2018, 488 : 205 - 219
  • [10] Djatmiko Fahim, 2019, 2019 International Conference of Artificial Intelligence and Information Technology (ICAIIT). Proceedings, P448, DOI 10.1109/ICAIIT.2019.8834552