Impact of convolutional neural network and FastText embedding on text classification

被引:56
作者
Umer, Muhammad [1 ]
Imtiaz, Zainab [2 ]
Ahmad, Muhammad [3 ]
Nappi, Michele [4 ]
Medaglia, Carlo [5 ]
Choi, Gyu Sang [6 ]
Mehmood, Arif [1 ]
机构
[1] Islamia Univ Bahawalpur, Dept Comp Sci & Informat Technol, Bahawalpur 63100, Pakistan
[2] Khwaja Fareed Univ Engn & Informat Technol KFUEIT, Dept Comp Sci, Rahim Yar Khan, Pakistan
[3] Khwaja Fareed Univ Engn & Informat Technol KFUEIT, Dept Comp Engn, Rahim Yar Khan, Pakistan
[4] Univ Salerno, Dept Comp Sci, Fisciano, Italy
[5] Link Campus Univ Rome, Res Dept, Via Casale San Pio V 44, I-00165 Rome, Italy
[6] Yeungnam Univ, Dept Informat & Commun Engn, Gyongsan 38541, South Korea
基金
新加坡国家研究基金会;
关键词
Convolutional Neural Network (CNN); FastText; Text mining; Deep learning; Natural language processing; SENTIMENT;
D O I
10.1007/s11042-022-13459-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Efficient word representation techniques (word embeddings) with modern machine learning models have shown reasonable improvement on automatic text classification tasks. However, the effectiveness of such techniques has not been evaluated yet in terms of insufficient word vector representation for training. Convolutional Neural Network has achieved significant results in pattern recognition, image analysis, and text classification. This study investigates the application of the CNN model on text classification problems by experimentation and analysis. We trained our classification model with a prominent word embedding generation model, Fast Text on publically available datasets, six benchmark datasets including Ag News, Amazon Full and Polarity, Yahoo Question Answer, Yelp Full, and Polarity. Furthermore, the proposed model has been tested on the Twitter US airlines non-benchmark dataset as well. The analysis indicates that using Fast Text as word embedding is a very promising approach.
引用
收藏
页码:5569 / 5585
页数:17
相关论文
共 56 条
[1]  
Aas Kjersti., 1999, Text categorisation: A survey
[2]  
Ali N.M., 2019, Int J Data Min Knowl Manag Process, V9, P19, DOI [10.5121/ijdkp.2019.9302, DOI 10.5121/IJDKP.2019.9302]
[3]  
[Anonymous], 2015, P 2015 C EMP METH NA, DOI [DOI 10.18653/V1/D15-1167, 10.18653/v1/D15-1167]
[4]  
[Anonymous], 2014, Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers
[5]  
Bollen J., 2011, Computer, V44, P91, DOI 10.1109/MC.2011.323
[6]   Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN [J].
Chen, Tao ;
Xu, Ruifeng ;
He, Yulan ;
Wang, Xuan .
EXPERT SYSTEMS WITH APPLICATIONS, 2017, 72 :221-230
[7]  
Conneau A, 2017, 15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, P1107
[8]  
Dai A.M., 2015, Document Embedding with Paragraph Vectors
[9]  
Dean J., 2013, EFFICIENT ESTIMATION
[10]   Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter [J].
Dodds, Peter Sheridan ;
Harris, Kameron Decker ;
Kloumann, Isabel M. ;
Bliss, Catherine A. ;
Danforth, Christopher M. .
PLOS ONE, 2011, 6 (12)