Sentiment Analysis of Noisy Malay Text: State of Art, Challenges and Future Work

被引：10

作者：

Abu Bakar, Muhammad Fakhrur Razi ^{[1
]}

Idris, Norisma ^{[1
]}

Shuib, Liyana ^{[2
]}

Khamis, Norazlina ^{[3
]}

机构：

[1] Univ Malaya, Fac Comp Sci & IT, Dept Artificial Intelligence, Kuala Lumpur 50603, Malaysia

[2] Univ Malaya, Fac Comp Sci & IT, Dept Informat Syst, Kuala Lumpur 50603, Malaysia

[3] Univ Malaysia Sabah, Fac Comp & Informat, Language Engn & Applicat Dev Res Grp, Kota Kinabalu 88400, Sabah, Malaysia

来源：

IEEE ACCESS | 2020年 / 8卷

关键词：

Hybrid; lexicon-based; machine learning; noisy Malay text; sentiment analysis;

D O I：

10.1109/ACCESS.2020.2968955

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Sentiment analysis (SA) is a study where people & x2019;s opinions and emotions are automatically extracted in the form of sentiments from the natural language text. In social media monitoring, it is very useful because it allows user to gain an overall picture of the extensive public opinion behind many topics. Most works on SA are for the English text. Only a few works focus on the Malay language. Currently, a review on SA for the Malay language only focus on the SA approaches and the dataset. Some major issues such as the pre-processing techniques used to normalize the noisy text, the most employed performance measures for Malay SA, and the challenges for Malay SA has not been reviewed. Malaysians tend not to fully follow any abbreviations rules when writing on social media. Thus, a lot of noisy text can be found in social media sites like Facebook and Twitter which create some issues to SA process. Hence, the aim of this study is to investigate the state of the art, challenges and future works of SA for Malay social media text. This study provides a review on various approaches, datasets, performance measures, and pre-processing techniques used in the previous works on SA of the Malay text. More than 700 articles from journals and conference proceedings have been identified using the search keywords, however, only 17 relevant articles published from year 2013 to 2018 were reviewed. The findings from this review focus on three commonly used SA approaches which are lexicon-based, machine learning, and hybrid.

引用

页码：24687 / 24696

页数：10

共 27 条

[1]

Al-Moslmi T., 2015, PROC 1 ICRIL INT C I, P1

[2] Malay sentiment analysis based on combined classification approaches and Senti-lexicon algorithm [J].

Al-Saffar, Ahmed ;

Awang, Suryanti ;

Tao, Hai ;

Omar, Nazlia ;

Al-Saiagh, Wafaa ;

Al-bared, Mohammed .

PLOS ONE, 2018, 13 (04)

[3]

Alsaffar A., 2015, J. Comput. Sci., V11, P639

[4]

Alsaffar A, 2014, I C INF TECH MULTIM, P270, DOI 10.1109/ICIMU.2014.7066643

[5]

[Anonymous], P 6 INT C EL ENG INF

[6]

[Anonymous], 2016, P 4 INT C ART INT CO

[7] Creating sentiment lexicon for sentiment analysis in Urdu: The case of a resource-poor language [J].

Asghar, Muhammad Zubair ;

Sattar, Anum ;

Khan, Aurangzeb ;

Ali, Amjad ;

Kundi, Fazal Masud ;

Ahmad, Shakeel .

EXPERT SYSTEMS, 2019, 36 (03)

[8]

Baldwin T., 2015, P 2015 C N AM CHAPT, P420, DOI DOI 10.3115/V1/N15-1045

[9] Sentiment Analysis of Malay Social Media Text [J].

Chekima, Khalifa ;

Alfred, Rayner .

COMPUTATIONAL SCIENCE AND TECHNOLOGY, ICCST 2017, 2018, 488 :205-219

[10]

Djatmiko Fahim, 2019, 2019 International Conference of Artificial Intelligence and Information Technology (ICAIIT). Proceedings, P448, DOI 10.1109/ICAIIT.2019.8834552

← 1 2 3 →