Multi-label Classification for Hate Speech and Abusive Language in Indonesian-Local Languages

被引:2
作者
Asti, Ajeng Dwi [1 ]
Budi, Indra [1 ]
Ibrohim, Muhammad Okky [1 ]
机构
[1] Univ Indonesia, Fac Comp Sci, Depok, Indonesia
来源
13TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS 2021) | 2021年
关键词
hate speech; multi-label classification; Indonesian local language; Twitter;
D O I
10.1109/ICACSIS53237.2021.9631316
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Each hate speech has a target, category, and level that needs to be detected to help the authorities prioritize hate speech cases that need to be solved first. Various studies have been conducted in Indonesia on abusive speech and hate speech and their targets, categories, and levels, but only in Indonesian and English. On the other hand, various local languages in Indonesia open up opportunities for hate speech to occur using the local language. This study aims to compare some of the best machine learning algorithms, transformation methods, and feature extraction techniques in classifying abusive language and hate speech and their targets, categories, and levels using Twitter data in Indonesian and local languages. This study uses five local languages in Indonesia with the most speakers: Javanese, Sundanese, Madurese, Minangkabau, and Musi (Palembang). The algorithms used are Support Vector Machine (SVM), Multinomial Naive Bayes (MNB), and Random Forest Decision Tree (RFDT) with Binary Relevance (BR), Classifier Chains (CC), and Label Powerset (LP) as transformation methods. The term weighting used in this study is TF-IDF with word n-gram and char n-gram features. The results showed that the SVM algorithm with the CC transformation method and unigram feature extraction gave the highest F1-score results, 66.33% for Javanese and 65.68% for Sundanese. In Madurese, Minangkabau, and Musi language data, the best F1-score was obtained using the RFDT algorithm with the CC transformation method and unigram feature extraction with F1-score 76.37% 80.75%, and 77.34%.
引用
收藏
页码:325 / 330
页数:6
相关论文
共 50 条
[41]   Classifier chains for multi-label classification [J].
Jesse Read ;
Bernhard Pfahringer ;
Geoff Holmes ;
Eibe Frank .
Machine Learning, 2011, 85
[42]   Multi-label classification of music by emotion [J].
Konstantinos Trohidis ;
Grigorios Tsoumakas ;
George Kalliris ;
Ioannis Vlahavas .
EURASIP Journal on Audio, Speech, and Music Processing, 2011
[43]   Metric Learning for Multi-label Classification [J].
Brighi, Marco ;
Franco, Annalisa ;
Maio, Dario .
STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, S+SSPR 2020, 2021, 12644 :24-33
[44]   Hyperspherical Learning in Multi-Label Classification [J].
Ke, Bo ;
Zhu, Yunquan ;
Li, Mengtian ;
Shu, Xiujun ;
Qiao, Ruizhi ;
Ren, Bo .
COMPUTER VISION, ECCV 2022, PT XXV, 2022, 13685 :38-55
[45]   Multi-label classification of music by emotion [J].
Trohidis, Konstantinos ;
Tsoumakas, Grigorios ;
Kalliris, George ;
Vlahavas, Ioannis .
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2011, :1-9
[46]   Semantic Abstractions for Multi-label Classification [J].
Wang, Xiaomei ;
Xuan, Xiaohua ;
Xu, Qing ;
Cai, Hua ;
Shen, Weilin .
ARTIFICIAL INTELLIGENCE LOGIC AND APPLICATIONS, AILA 2024, 2025, 2248 :143-151
[47]   Multi-Label Classification With Hyperdimensional Representations [J].
Chandrasekaran, Rishikanth ;
Asgareinjad, Fatemeh ;
Morris, Justin ;
Rosing, Tajana .
IEEE ACCESS, 2023, 11 :108458-108474
[48]   Source Detection With Multi-Label Classification [J].
Vijayamohanan, Jayakrishnan ;
Gupta, Arjun ;
Noakoasteen, Oameed ;
Goudos, Sotirios K. K. ;
Christodoulou, Christos G. .
IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2023, 4 :336-345
[49]   Detection and Multi-label Classification of Bats [J].
Dierckx, Lucile ;
Beauvois, Melanie ;
Nijssen, Siegfried .
ADVANCES IN INTELLIGENT DATA ANALYSIS XX, IDA 2022, 2022, 13205 :53-65
[50]   Multi-label Classification for Past Events [J].
Sumikawa, Yasunobu ;
Ikejiri, Ryohei .
2018 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2018), 2018, :562-567