A Short Text Classification Method Based on N-Gram and CNN

被引:41
作者
Wang, Haitao [1 ]
He, Jie [1 ]
Zhang, Xiaohong [1 ]
Liu, Shufen [2 ]
机构
[1] Henan Polytech Univ, Coll Comp Sci & Technol, Jiaozuo 454000, Henan, Peoples R China
[2] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China
基金
中国国家自然科学基金;
关键词
Short text; Classification; Convolution neural network; N-gram; Concentration mechanism;
D O I
10.1049/cje.2020.01.001
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Text classification is a fundamental task in Nature language process (NLP) application. Most existing research work relied on either explicate or implicit text representation to settle this kind of problems, while these techniques work well for sentence and can not simply apply to short text because of its shortness and sparseness feature. Given these facts that obtaining the simple word vector feature and ignoring the important feature by utilizing the traditional multi-size filter Convolution neural network (CNN) during the course of text classification task, we offer a kind of short text classification model by CNN, which can obtain the abundant text feature by adopting none linear sliding method and N-gram language model, and picks out the key features by using the concentration mechanism, in addition employing the pooling operation can preserve the text features at the most certain as far as possible. The experiment shows that this method we offered, comparing the traditional machine learning algorithm and convolutional neural network, can markedly improve the classification result during the short text classification.
引用
收藏
页码:248 / 254
页数:7
相关论文
共 13 条
[1]  
[Anonymous], 2017, COMPUTER RES DEV
[2]  
Bahdanau D, 2014, COMPUTER SCI, V18, P124
[3]   Classification of text documents based on score level fusion approach [J].
Bhushan, S. N. Bharath ;
Danti, Ajit .
PATTERN RECOGNITION LETTERS, 2017, 94 :118-126
[4]  
Fan LUO, 2017, BEIJING U NATURAL SC, DOI [10.13209/j.0479-8023, DOI 10.13209/J.0479-8023]
[5]  
FENG Xingjie, 2018, COMPUTER APPL RES, V35, P1434
[6]  
Guo J, 2017, INT CONF COMP SCI ED, P787, DOI 10.1109/ICCSE.2017.8085601
[7]   Reducing the dimensionality of data with neural networks [J].
Hinton, G. E. ;
Salakhutdinov, R. R. .
SCIENCE, 2006, 313 (5786) :504-507
[8]  
Lei T., 2015, Indiana Univ.Math. J., V58, P1151
[9]  
Mandelbaum A, 2016, MACH LEARN, V26, P1
[10]   A recent overview of the state-of-the-art elements of text classification [J].
Mironczuk, Marcin Michal ;
Protasiewicz, Jaroslaw .
EXPERT SYSTEMS WITH APPLICATIONS, 2018, 106 :36-54