HiSAT: Hierarchical Framework for Sentiment Analysis on Twitter Data

被引：1

作者：

Kommu, Amrutha ^{[1
]}

Patel, Snehal ^{[1
]}

Derosa, Sebastian ^{[1
]}

Wang, Jiayin ^{[1
]}

Varde, Aparna S. ^{[1
]}

机构：

[1] Montclair State Univ, Montclair, NJ 07043 USA

来源：

INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1 | 2023年 / 542卷

基金：

美国国家科学基金会;

关键词：

Bayesian models; Knowledge discovery; Logistic Regression; NLP; Opinion mining; Random Forest; Social media; Text mining; EMOTION RECOGNITION FEATURES;

D O I：

10.1007/978-3-031-16072-1_28

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Social media websites such as Twitter have become so indispensable today that people use them almost on a daily basis for sharing their emotions, opinions, suggestions and thoughts. Motivated by such behavioral tendencies, the purpose of this study is to define an approach to automatically classify the tweets on Twitter data into two main classes, namely, hate speech and non-hate speech. This provides a valuable source of information in analyzing and understanding target audiences and spotting marketing trends. We thus propose HiSAT, a Hierarchical framework for Sentiment Analysis on Twitter data. Sentiments/opinions in tweets are highly unstructured-and do not have a proper defined sequence. They constitute a heterogeneous data from many sources having different formats, and express either positive or negative, or neutral sentiment. Hence, in HiSAT we conduct Natural Language Processing encompassing tokenization, stemming and lemmatization techniques that convert text to tokens; as well as Bag-of-Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF) techniques that convert text sentences into numeric vectors. These are then fed as inputs to Machine learning algorithms within the HiSAT framework; more specifically, Random Forest, Logistic Regression and Naive Bayes are used as text-binary classifiers to detect hate speech and non-hate speech from the tweets. Results of experiments performed with the HiSAT framework show that Random Forest outperforms the others with a better prediction in estimating the correct labels (with accuracy above the 95% range). We present the HiSAT approach, its implementation and experiments, along with related work and ongoing research.

引用

页码：376 / 392

页数：17

共 30 条

[1]

Anjaria M, 2014, INT CONF COMMUN SYST

[2]

[Anonymous], Twitter sentiment analysis

[3] Deep Learning for Hate Speech Detection in Tweets [J].

Badjatiya, Pinkesh ;

Gupta, Shashank ;

Gupta, Manish ;

Varma, Vasudeva .

WWW'17 COMPANION: PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2017, :759-760

[4]

Bifet A, 2010, LECT NOTES ARTIF INT, V6332, P1, DOI 10.1007/978-3-642-16184-1_1

[5] Visualizing Transformers for NLP: A Brief Survey [J].

Brasoveanu, Adrian M. P. ;

Andonie, Razvan .

2020 24TH INTERNATIONAL CONFERENCE INFORMATION VISUALISATION (IV 2020), 2020, :270-279

[6] Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech [J].

Cao, Houwei ;

Verma, Ragini ;

Nenkova, Ani .

COMPUTER SPEECH AND LANGUAGE, 2015, 29 (01) :186-202

[7] Speech emotion recognition: Features and classification models [J].

Chen, Lijiang ;

Mao, Xia ;

Xue, Yuli ;

Cheng, Lee Lung .

DIGITAL SIGNAL PROCESSING, 2012, 22 (06) :1154-1160

[8]

Cristianini N., 2008, Encyclopedia of Algorithms

[9]

Davidson Thomas, 2017, 11 INT AAAI C WEB SO, DOI DOI 10.1609/ICWSM.V11I1.14955

[10]

Du X, 2020, AIDS CARE, V32, P1182, DOI [10.1080/09540121.2019.1686601, 10.1145/3352683.3352688]

← 1 2 3 →