Indonesian Twitter Emotion Recognition Model using Feature Engineering

被引：0

作者：

Sutoyo, Rhio ^{[1
]}

Warnars, Harco Leslie Hendric Spits ^{[1
]}

Isa, Sani Muhamad ^{[2
]}

Budiharto, Widodo ^{[3
]}

机构：

[1] Bina Nusantara Univ, Comp Sci Dept, BINUS Grad Program Doctor Comp Sci, Jakarta 11480, Indonesia

[2] Bina Nusantara Univ, Comp Sci Dept, BINUS Grad Program Master Comp Sci, Jakarta 11480, Indonesia

[3] Bina Nusantara Univ, Sch Comp Sci, Comp Sci Dept, Jakarta 11480, Indonesia

来源：

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS | 2023年 / 14卷 / 12期

关键词：

Text classification; feature engineering; emotion recognition; Indonesian tweet; natural language processing;

D O I：

10.14569/IJACSA.2023.01412108

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Twitter is a social media platform that has a large amount of unstructured natural language text. The content of Twitter can be utilized to capture human behavior via emphasized emotions located in tweets. In their tweets, people commonly express emotions to show their feelings. Hence, it is crucial to recognize the text's underlined emotions to understand the message's meaning. Feature engineering is the process of improving raw data into often overlooked features. This research explores feature engineering techniques to find the best features for building an emotion recognition model on the Indonesian Twitter dataset. Two different text data representations were used, namely, TF-IDF and word embedding. This research proposed 12 feature engineering configurations in TF-IDF by combining data stemming, data augmentation, and machine learning classifiers. Moreover, this research proposed 27 feature engineering configurations in word embedding by combining three -word embedding models, three pooling techniques, and three machine -learning classifiers. In total, there are 39 feature engineering combinations. The configuration with the best F-1 score is TF-IDF with logistic regression, stemmed dataset, and augmented dataset. The model achieved 65.27% accuracy and 66.09% F-1 score. The detailed characteristics from the top seven models in TF-IDF also follow the same feature engineering configuration. Lastly, this work improves performance from the previous research by 1.44% and 2.01% on the word2vec and fastText approaches, respectively.

引用

页码：1057 / 1065

页数：9

共 24 条

[1]

Alm C., 2005, Emotions from text: Machine learning for text-based emotion prediction, DOI [10.3115/1220575.1220648, DOI 10.3115/1220575.1220648]

[2]

Andrew, 2020, ICIC Express Letters, V14, P1097, DOI 10.24507/icicel.14.11.1097

[3]

[Anonymous], 2007, ACM Transactions on Asian Language Information Processing, DOI DOI 10.1145/1316457.1316459

[4] Prediction and analysis of Indonesia Presidential election from Twitter using sentiment analysis [J].

Budiharto, Widodo ;

Meiliana, Meiliana .

JOURNAL OF BIG DATA, 2018, 5 (01)

[5] SMOTE: Synthetic minority over-sampling technique [J].

Chawla, Nitesh V. ;

Bowyer, Kevin W. ;

Hall, Lawrence O. ;

Kegelmeyer, W. Philip .

2002, American Association for Artificial Intelligence (16)

[6]

Dan-Glauser E. S., 2012, Swiss The Journal of Psychology

[7]

Di Wu, 2015, ICIC Express Letters, V9, P1637

[8]

Girsang A. S., 2021, J. Phys. Conf. Ser., V1807

[9]

Guo XY, 2019, 2019 SIXTH INTERNATIONAL CONFERENCE ON SOCIAL NETWORKS ANALYSIS, MANAGEMENT AND SECURITY (SNAMS), P472, DOI [10.1109/SNAMS.2019.8931720, 10.1109/snams.2019.8931720]

[10]

Hyunhee Jung, 2016, ICIC Express Letters, V10, P1523

← 1 2 3 →