Twitter Dataset and Evaluation of Transformers for Turkish Sentiment Analysis

被引:12
|
作者
Koksal, Abdullatif [1 ]
Ozgur, Arzucan [1 ]
机构
[1] Bogazici Univ, Bilgisayar Muhendisligi Bolumu, Istanbul, Turkey
来源
29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021) | 2021年
关键词
sentiment analysis; Turkish dataset; Twitter; BounTi; transformers; BERT;
D O I
10.1109/SIU53274.2021.9477814
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Sentiment analysis is one of the key topics in Natural Language Processing which helps several applications from social media analysis to stock market prediction. Sentiment analysis datasets are generally collected by semi-supervision through shopping or review websites. These datasets are constructed by mapping users' text reviews to the given scores by users. However, these datasets might contain errors due to automatic mapping, and generally they don't have the characteristic features of social media texts such as emojis, slangs, and typos. To address these problems, one of the first manually annotated Turkish Sentiment Analysis datasets from Twitter is proposed. The BounTi dataset contains Turkish tweets about specific universities at Turkey. Furthermore, the performance of multilingual and Turkish transformer models such as MBERT, XLM-Roberta, and BERTurk are analyzed for this dataset. The best proposed model is based on BERTurk and achieves 0.729 macro-averaged recall score on the test set. Finally, a social media analysis demonstration with the best model is performed on Turkish tweets about a food brand. BounTi dataset, finetuned models, and related scripts are publicly released.
引用
收藏
页数:4
相关论文
共 50 条
  • [41] Like It or Not: A Survey of Twitter Sentiment Analysis Methods
    Giachanou, Anastasia
    Crestani, Fabio
    ACM COMPUTING SURVEYS, 2016, 49 (02)
  • [42] Sentiment Analysis for Twitter Data in the Hindi Language
    Madan, Anjum
    Ghose, Udayan
    2021 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2021), 2021, : 784 - 789
  • [43] Opinion Mining System for Twitter Sentiment Analysis
    Aquino, Pamella A.
    Lopez, Vivian F.
    Moreno, Maria N.
    Munoz, Maria D.
    Rodriguez, Sara
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2020, 2020, 12344 : 465 - 476
  • [44] Spanish sentiment analysis in Twitter at the TASS workshop
    Ferran Pla
    Lluís-F. Hurtado
    Language Resources and Evaluation, 2018, 52 : 645 - 672
  • [45] Sentiment Analysis of Twitter Posts on Global Conflicts
    Sasikumar, Ujwal
    Zaman, A. N. K.
    Mawlood-Yunis, Abdul-Rahman
    Chatterjee, Prosenjit
    2023 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE, CSCI 2023, 2023, : 759 - 764
  • [46] Adapting Sentiment Lexicons Using Contextual Semantics for Sentiment Analysis of Twitter
    Saif, Hassan
    He, Yulan
    Fernandez, Miriam
    Alani, Harith
    SEMANTIC WEB: ESWC 2014 SATELLITE EVENTS, 2014, 8798 : 54 - 63
  • [47] New Clustering Algorithms for Twitter Sentiment Analysis
    Rehioui, Hajar
    Idrissi, Abdellah
    IEEE SYSTEMS JOURNAL, 2020, 14 (01): : 530 - 537
  • [48] Twitter Ontology-Driven Sentiment Analysis
    Cotfas, Liviu-Adrian
    Delcea, Camelia
    Roxin, Ioan
    Paun, Ramona
    NEW TRENDS IN INTELLIGENT INFORMATION AND DATABASE SYSTEMS, 2015, 598 : 131 - 139
  • [49] Sentiment mapping: point pattern analysis of sentiment classified Twitter data
    Camacho, Ken
    Portelli, Raechel
    Shortridge, Ashton
    Takahashi, Bruno
    CARTOGRAPHY AND GEOGRAPHIC INFORMATION SCIENCE, 2021, 48 (03) : 241 - 257
  • [50] SENTIMENT ANALYSIS ON TWITTER USING STREAMING API
    Trupthi, M.
    Pabboju, Suresh
    Narasimha, G.
    2017 7TH IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2017, : 915 - 919