Twitter Dataset and Evaluation of Transformers for Turkish Sentiment Analysis

被引:12
|
作者
Koksal, Abdullatif [1 ]
Ozgur, Arzucan [1 ]
机构
[1] Bogazici Univ, Bilgisayar Muhendisligi Bolumu, Istanbul, Turkey
来源
29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021) | 2021年
关键词
sentiment analysis; Turkish dataset; Twitter; BounTi; transformers; BERT;
D O I
10.1109/SIU53274.2021.9477814
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Sentiment analysis is one of the key topics in Natural Language Processing which helps several applications from social media analysis to stock market prediction. Sentiment analysis datasets are generally collected by semi-supervision through shopping or review websites. These datasets are constructed by mapping users' text reviews to the given scores by users. However, these datasets might contain errors due to automatic mapping, and generally they don't have the characteristic features of social media texts such as emojis, slangs, and typos. To address these problems, one of the first manually annotated Turkish Sentiment Analysis datasets from Twitter is proposed. The BounTi dataset contains Turkish tweets about specific universities at Turkey. Furthermore, the performance of multilingual and Turkish transformer models such as MBERT, XLM-Roberta, and BERTurk are analyzed for this dataset. The best proposed model is based on BERTurk and achieves 0.729 macro-averaged recall score on the test set. Finally, a social media analysis demonstration with the best model is performed on Turkish tweets about a food brand. BounTi dataset, finetuned models, and related scripts are publicly released.
引用
收藏
页数:4
相关论文
共 50 条
  • [1] Sentiment Analysis of Turkish Twitter Data
    Shehu, Harisu Abdullahi
    Tokat, Sezai
    Sharif, Md. Haidar
    Uyaver, Sahin
    THIRD INTERNATIONAL CONFERENCE OF MATHEMATICAL SCIENCES (ICMS 2019), 2019, 2183
  • [2] Sentiment Analysis for Turkish Twitter Feeds
    Coban, Onder
    Ozyer, Baris
    Ozyer, Gulsah Tumuklu
    2015 23RD SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2015, : 2388 - 2391
  • [3] A Hybrid Approach for the Sentiment Analysis of Turkish Twitter Data
    Shehu, H. A.
    Tokat, S.
    ARTIFICIAL INTELLIGENCE AND APPLIED MATHEMATICS IN ENGINEERING PROBLEMS, 2020, 43 : 182 - 190
  • [4] Soaring Energy Prices: Understanding Public Engagement on Twitter Using Sentiment Analysis and Topic Modeling With Transformers
    Kastrati, Zenun
    Imran, Ali Shariq
    Daudpota, Sher Muhammad
    Memon, Muhammad Atif
    Kastrati, Muhamet
    IEEE ACCESS, 2023, 11 : 26541 - 26553
  • [5] Preparation of Improved Turkish DataSet for Sentiment Analysis in Social Media
    Makinist, Semiha
    Hallac, Ibrahim Riza
    Karakus, Betul Ay
    Aydin, Galip
    2ND INTERNATIONAL CONFERENCE ON COMPUTATIONAL MATHEMATICS AND ENGINEERING SCIENCES (CMES2017), 2017, 13
  • [6] Deep Sentiment Analysis: A Case Study on Stemmed Turkish Twitter Data
    Shehu, Harisu Abdullahi
    Sharif, Md. Haidar
    Sharif, Md. Haris Uddin
    Datta, Ripon
    Tokat, Sezai
    Uyaver, Sahin
    Kusetogullari, Huseyin
    Ramadan, Rabie A.
    IEEE ACCESS, 2021, 9 : 56836 - 56854
  • [7] MSTD: Moroccan Sentiment Twitter Dataset
    Mihi, Soukaina
    Ali, Brahim Ait Ben
    El Bazi, Ismail
    Arezki, Sara
    Laachfoubi, Nabil
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (10) : 363 - 372
  • [8] Evaluation of Sentiment Analysis in Finance: From Lexicons to Transformers
    Mishev, Kostadin
    Gjorgjevikj, Ana
    Vodenska, Irena
    Chitkushev, Lubomir T.
    Trajanov, Dimitar
    IEEE ACCESS, 2020, 8 : 131662 - 131682
  • [9] Sentiment Analysis on Twitter
    Meral, Meric
    Diri, Banu
    2014 22ND SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2014, : 690 - 693
  • [10] An aspect-level sentiment analysis dataset for therapies on Twitter
    Guo, Yuting
    Das, Sudeshna
    Lakamana, Sahithi
    Sarker, Abeed
    DATA IN BRIEF, 2023, 50