Impact of convolutional neural network and FastText embedding on text classification

被引:56
作者
Umer, Muhammad [1 ]
Imtiaz, Zainab [2 ]
Ahmad, Muhammad [3 ]
Nappi, Michele [4 ]
Medaglia, Carlo [5 ]
Choi, Gyu Sang [6 ]
Mehmood, Arif [1 ]
机构
[1] Islamia Univ Bahawalpur, Dept Comp Sci & Informat Technol, Bahawalpur 63100, Pakistan
[2] Khwaja Fareed Univ Engn & Informat Technol KFUEIT, Dept Comp Sci, Rahim Yar Khan, Pakistan
[3] Khwaja Fareed Univ Engn & Informat Technol KFUEIT, Dept Comp Engn, Rahim Yar Khan, Pakistan
[4] Univ Salerno, Dept Comp Sci, Fisciano, Italy
[5] Link Campus Univ Rome, Res Dept, Via Casale San Pio V 44, I-00165 Rome, Italy
[6] Yeungnam Univ, Dept Informat & Commun Engn, Gyongsan 38541, South Korea
基金
新加坡国家研究基金会;
关键词
Convolutional Neural Network (CNN); FastText; Text mining; Deep learning; Natural language processing; SENTIMENT;
D O I
10.1007/s11042-022-13459-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Efficient word representation techniques (word embeddings) with modern machine learning models have shown reasonable improvement on automatic text classification tasks. However, the effectiveness of such techniques has not been evaluated yet in terms of insufficient word vector representation for training. Convolutional Neural Network has achieved significant results in pattern recognition, image analysis, and text classification. This study investigates the application of the CNN model on text classification problems by experimentation and analysis. We trained our classification model with a prominent word embedding generation model, Fast Text on publically available datasets, six benchmark datasets including Ag News, Amazon Full and Polarity, Yahoo Question Answer, Yelp Full, and Polarity. Furthermore, the proposed model has been tested on the Twitter US airlines non-benchmark dataset as well. The analysis indicates that using Fast Text as word embedding is a very promising approach.
引用
收藏
页码:5569 / 5585
页数:17
相关论文
共 56 条
[11]  
Du CX, 2019, AAAI CONF ARTIF INTE, P6359
[12]  
Fan RE, 2008, J MACH LEARN RES, V9, P1871
[13]   Greedy function approximation: A gradient boosting machine [J].
Friedman, JH .
ANNALS OF STATISTICS, 2001, 29 (05) :1189-1232
[15]  
Grave E., 2017, P 15 C EUROPEAN CHAP
[16]   Correlation and variable importance in random forests [J].
Gregorutti, Baptiste ;
Michel, Bertrand ;
Saint-Pierre, Philippe .
STATISTICS AND COMPUTING, 2017, 27 (03) :659-678
[17]  
Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
[18]   Duplicate Questions Pair Detection Using Siamese MaLSTM [J].
Imtiaz, Zainab ;
Umer, Muhammad ;
Ahmad, Muhammad ;
Ullah, Saleem ;
Choi, Gyu Sang ;
Mehmood, Arif .
IEEE ACCESS, 2020, 8 :21932-21942
[19]  
Iyyer M, 2015, PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, P1681
[20]  
Johnson R., 2015, P 2015 C N AM CHAPT