A Text Vector Representation Model Merging Multi-Granularity Information

被引:0
作者
Nie W. [1 ]
Chen Y. [1 ]
Ma J. [1 ]
机构
[1] College of Economics and Management, Nanjing University of Aeronautics and Astronautics, Nanjing
关键词
Convolutional Neural Network; Text Classification; Topic Model; Word Vector;
D O I
10.11925/infotech.2096-3467.2018.1161
中图分类号
学科分类号
摘要
[Objective] This paper proposed a model to extract semantic features from texts more comprehensively and to improve the representation of semantics by text vectors. [Methods] We obtained the word-granularity, topic-granularity and character-granularity feature vectors with the help of convolutional neural networks. Then, the three feature vectors were combined by the “merging gate” mechanism to generate the final text vectors. Finally, we examined the model with text classification experiment. [Results] The accuracy (92.56%), the precision (92.33%), the recall (92.07%) and the F-score (92.20%), were 2.40%, 2.05%, 1.77% and 1.91% higher than the results of Text-CNN. [Limitations] The Long-distance dependency features need to be included and the corpus size needs to be expanded. [Conclusions] The proposed model could better represent the text semantics. © 2019 The Author(s).
引用
收藏
页码:45 / 52
页数:7
相关论文
共 32 条
[1]  
Zong Chengqing, Statistical Natural Language Processing, pp. 416-419, (2013)
[2]  
Rui Weikang, A Research on Text Vector Representation Based on Semantics, (2017)
[3]  
Niu Liqiang, A Research on Text Vector Representations and Modelling Based on Neural Networks, (2016)
[4]  
Salton G, Wong A, Yang C S., A Vector Space Model for Automatic Indexing, Communications of the ACM, 18, 11, pp. 613-620, (1975)
[5]  
Blei D M, Ng A Y, Jordan M I., Latent Dirichlet Allocation, Journal of Machine Learning Research, 3, pp. 993-1022, (2003)
[6]  
Yao Quanzhu, Song Zhili, Peng Cheng, Research on Text Categorization Based on LDA, Computer Engineering and Applications, 47, 13, pp. 150-153, (2011)
[7]  
Xu Yanhua, Miao Yujie, Miao Lin, Et al., Generating HSK Writing Essays with LDA Model, Data Analysis and Knowledge Discovery, 2, 9, pp. 80-87, (2018)
[8]  
Kim Y, Shim K., TWILITE: A Recommendation System for Twitter Using a Probabilistic Model Based on Latent Dirichlet Allocation, Information Systems, 42, pp. 59-77, (2014)
[9]  
Mikolov T, Sutskever I, Chen K, Et al., Distributed Representations of Words and Phrases and Their Compositionality[C], Proceedings of the Neural Information Processing Systems 2013, (2013)
[10]  
Mikolov T, Chen K, Corrado G, Et al., Efficient Estimation of Word Representations in Vector Space