Translation Is Not Enough: Comparing Lexicon-based Methods for Sentiment Analysis in Persian

被引:0
|
作者
Basiri, Mohammad Ehsan [1 ]
Kabiri, Arman [1 ]
机构
[1] Shahrekord Univ, Dept Comp Engn, Shahrekord, Iran
来源
2017 18TH CSI INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING CONFERENCE (CSSE) | 2017年
关键词
component; Sentiment Analysis; Natural Language Processing; Persian Language; Lexicon-based approach; Opinion mining; Data Mining; SOCIAL MEDIA;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Sentiment analysis is a subfield of data mining and natural language processing with the aim of extracting people's opinion and appraisals from their comments on the Web. Contrary to machine learning approach, lexicon-based methods have some important advantages like domain-independency and being needless of a large annotated training corpus and hence are faster. This makes lexicon-based approach prevalent in the sentiment analysis community. However, for Persian language, in contrast to English, using lexicon-based method is a new discipline. There are limited lexicons available for sentiment analysis in Persian, almost all of them are directly translated from English. In the current study, four lexicons are compared to show the importance of lexicons in the performance of document-level sentiment analysis. Specifically, the Persian version of NRC lexicon, SentiStrength, CNRC, and Adjectives are compared in a pure lexicon-based scenario. Experiments are carried out on the document-level edition of SPerSent dataset. Results show that direct translation used in NRC leads the poorest performance while pre-processing and refining lexicons used in SentiStrength and CNRC improves the performance. Also, the results show that using just adjectives leads to higher results in comparison to using NRC.
引用
收藏
页码:36 / 41
页数:6
相关论文
共 50 条
  • [41] Lexicon-based Sentiment Analysis with Pattern Matching Application using Regular Expression in Automata
    Contreras, Jennifer O.
    Ballera, Melvin A.
    Lagman, Ace C.
    Raviz, Jennalyn G.
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: IOT AND SMART CITY (ICIT 2018), 2018, : 31 - 36
  • [42] LMS Content Evaluation System with Sentiment Analysis Using Lexicon-Based Approach
    Tan, Riegie D.
    Piad, Keno
    Lagman, Ace
    Victoriano, Jayson
    Tano, Isagani
    San Gabriel, Nicanor, Jr.
    Espino, Joseph
    2022 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND EDUCATION TECHNOLOGY (ICIET 2022), 2022, : 93 - 98
  • [43] Real-Time Lexicon-Based Sentiment Analysis Experiments On Twitter With A Mild (More Information, Less Data)
    Arslan, Yusuf
    Birturk, Aysenur
    Djumabaev, Bekjan
    Kucuk, Dilek
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 1892 - 1897
  • [44] US presidential election 2020 prediction based on Twitter data using lexicon-based sentiment analysis
    Nugroho, Deni Kurnianto
    2021 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2021), 2021, : 136 - 141
  • [45] Detecting sentiment embedded in Arabic social media - A lexicon-based approach
    Duwairi, R. M.
    Ahmed, Nizar A.
    Al-Rifai, Saleh Y.
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2015, 29 (01) : 107 - 117
  • [46] Building lexicon-based sentiment analysis model for low-resource languages
    Mohammed, Idi
    Prasad, Rajesh
    METHODSX, 2023, 11
  • [47] SentiFars: A Persian Polarity Lexicon for Sentiment Analysis
    Dehkharghani, Rahim
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (02)
  • [48] A Review on Lexicon-Based and Machine Learning Political Sentiment Analysis Using Tweets
    Britzolakis, Alexandros
    Kondylakis, Haridimos
    Papadakis, Nikolaos
    INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2020, 14 (04) : 517 - 563
  • [49] A Lexicon-based Approach for Sentiment Classification of Amazon Books Reviews in Italian Language
    Chiavetta, Franco
    Lo Bosco, Giosue
    Pilato, Giovanni
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 2 (WEBIST), 2016, : 159 - 170
  • [50] Using Hybrid-Stemming Approach to Enhance Lexicon-based Sentiment Analysis in Arabic
    Awwad, Hunaida
    Alpkocak, Adil
    2017 INTERNATIONAL CONFERENCE ON NEW TRENDS IN COMPUTING SCIENCES (ICTCS), 2017, : 229 - 235