Optimizing Semantic Deep Forest for tweet topic classification

被引:30
作者
Daouadi, Kheir Eddine [1 ]
Rebai, Rim Zghal [2 ]
Amous, Ikram [3 ]
机构
[1] Sfax Univ, Fac Econ & Management Sfax, Sfax, Tunisia
[2] Sfax Univ, Higher Inst Comp Sci & Multimedia Sfax, Sfax, Tunisia
[3] Sfax Univ, Natl Sch Elect & Telecommun Sfax, Sfax, Tunisia
关键词
Semantic Deep Forest; Contextual Word2vec; Topic classification; EVENT DETECTION; TWITTER; BOT;
D O I
10.1016/j.is.2021.101801
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, topic detection from Twitter attracts the attention of several researchers around the world. Different topic classification approaches have been proposed as a result of these research efforts. However, four of the major challenges faced in this context are the use of handcrafted features, the use of Deep Learning algorithms with so many parameters, the fact that their performance is still limited and the lack of sufficient labeled datasets. We propose, Semantic Deep Forest (SDF), a topic classification approach that incorporates contextual Word2vec, WordNet and Deep Forest to detect topic from Twitter accurately. Moreover, extensive parameter sensitivity analysis were conducted to fine-tune the parameters of SDF for our Tweet topic classification task to achieve the best performance. We conducted experiments on three benchmark datasets with standard evaluation scenarios. Experimental results show that: (1) the proposed contextual word2vec models can be successfully used for tweet topic classification and outperform existing state-of-the-art embedding model; (2) The proposed SDF improve the accuracy of tweet topic classification and outperform existing state-of-the-art classification approaches; (3) the proposed SDF does not require huge amount of labeled data in order to achieve good performance, which is the lack in the majority of the state-of-the-art approaches. (C) 2021 Elsevier Ltd. All rights reserved.
引用
收藏
页数:10
相关论文
共 54 条
[1]   Classifying Political Tweets Using Naive Bayes and Support Vector Machines [J].
Al Hamoud, Ahmed ;
Alwehaibi, Ali ;
Roy, Kaushik ;
Bikdash, Marwan .
RECENT TRENDS AND FUTURE TECHNOLOGY IN APPLIED INTELLIGENCE, IEA/AIE 2018, 2018, 10868 :736-744
[2]  
[Anonymous], 2017, INT JOINT C ART INT
[3]  
Bhatia Gresha, 2020, ICT Analysis and Applications. Proceedings of ICT4SD 2019. Lecture Notes in Networks and Systems (LNNS 93), P365, DOI 10.1007/978-981-15-0630-7_36
[4]  
Bisht A., 2020, Recent Trends in Image and Signal Processing in Computer Vision, V1124, P243, DOI 10.1007/978-981-15-2740-1_17
[5]   Remote Sensing Scene Classification Using Convolutional Features and Deep Forest Classifier [J].
Boualleg, Yaakoub ;
Farah, Mohamed ;
Farah, Imed Riadh .
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2019, 16 (12) :1944-1948
[6]   Link abstraction models for multicarrier systems: A logistic regression approach [J].
Carreras Mesa, Alberto ;
Carmen Aguayo-Torres, Mari ;
Martin-Vega, Francisco J. ;
Gomez, Gerardo ;
Blanquez-Casado, Francisco ;
Delgado-Luque, Isabel M. ;
Entrambasaguas, Jose .
INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, 2018, 31 (01)
[7]  
Cer D., 2018, ARXIV PREPRINT ARXIV
[8]   Multilevel cumulative logistic regression model with random effects: Application to British social attitudes panel survey data [J].
Chan, Moon-tong ;
Yu, Dalei ;
Yau, Kelvin K. W. .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2015, 88 :173-186
[9]  
Daouadi K.E., 2018, 2018 C LANGUAGE PROC
[10]  
Daouadi KE, 2020, J UNIVERS COMPUT SCI, V26, P496