Impact of convolutional neural network and FastText embedding on text classification

被引:0
作者
Muhammad Umer
Zainab Imtiaz
Muhammad Ahmad
Michele Nappi
Carlo Medaglia
Gyu Sang Choi
Arif Mehmood
机构
[1] The Islamia University of Bahawalpur,Department of Computer Science & Information Technology
[2] Khwaja Fareed University of Engineering and Information Technology (KFUEIT),Department of Computer Science
[3] Khwaja Fareed University of Engineering and Information Technology (KFUEIT),Department of Computer Engineering
[4] University of Salerno,Department of Computer Science
[5] Link Campus University of Rome,Research Department
[6] Yeungnam University,Department of Information and Communication Engineering
来源
Multimedia Tools and Applications | 2023年 / 82卷
关键词
Convolutional Neural Network (CNN); FastText; Text mining; Deep learning; Natural language processing;
D O I
暂无
中图分类号
学科分类号
摘要
Efficient word representation techniques (word embeddings) with modern machine learning models have shown reasonable improvement on automatic text classification tasks. However, the effectiveness of such techniques has not been evaluated yet in terms of insufficient word vector representation for training. Convolutional Neural Network has achieved significant results in pattern recognition, image analysis, and text classification. This study investigates the application of the CNN model on text classification problems by experimentation and analysis. We trained our classification model with a prominent word embedding generation model, Fast Text on publically available datasets, six benchmark datasets including Ag News, Amazon Full and Polarity, Yahoo Question Answer, Yelp Full, and Polarity. Furthermore, the proposed model has been tested on the Twitter US airlines non-benchmark dataset as well. The analysis indicates that using Fast Text as word embedding is a very promising approach.
引用
收藏
页码:5569 / 5585
页数:16
相关论文
共 84 条
[1]  
Ali N(2019)Sentiment analysis for movies reviews dataset using deep learning models Int J Data Mining Knowl Manag Process 09 19-27
[2]  
Hamid M(2011)Twitter mood predicts the stock market J Comput Sci 2 1-8
[3]  
Youssif A(2017)Improving sentiment analysis via sentence type classification using bilstm-crf and cnn Expert Syst Appl 72 221-230
[4]  
Bollen J(2011)Temporal patterns of happiness and information in a global social network: hedonometrics and twitter PLoS ONE 6 e26752-6366
[5]  
Mao H(2019)Explicit interaction model towards text classification Proceedings of the AAAI Conference on Artificial Intelligence 33 6359-1874
[6]  
Zeng X(2008)Liblinear: a library for large linear classification J Mach Learn Res 9 1871-133
[7]  
Chen T(1984)Learning characteristics of stochastic-gradient-descent algorithms: a general study, analysis, and critique Signal Process 6 113-678
[8]  
Xu R(2017)Correlation and variable importance in random forests Stat Comput 27 659-80
[9]  
He Y(1997)Long short-term memory Neur Comput 9 1735-21942
[10]  
Wang X(2020)Duplicate questions pair detection using siamese malstm IEEE Access 8 21932-584