Chinese Text Classification with Feature Fusion

被引:0
|
作者
Wang Y. [1 ]
Wang H. [2 ]
Yu B. [2 ,3 ]
机构
[1] Economic and Technical College, Anhui Agricultural University, Hefei
[2] School of Management, Hefei University of Technology, Hefei
[3] Key Laboratory of Process Optimization & Intelligent Decision-Making, Ministry of Education, Hefei University of Technology, Hefei
基金
中国国家自然科学基金;
关键词
Chinese Character Features; Part of Speech Tag; Pinyin Character Features; Text Classification; Word Level Characteristics;
D O I
10.11925/infotech.2096-3467.2021.0228
中图分类号
学科分类号
摘要
[Objective] This paper proposes a new classification model for Chinese texts, aiming to address the issues of weak structure, spelling errors or homonyms in the texts. [Methods] We constructed a multi-feature fusion method based on the traditional fusion features model for text classification. Then, we combined word level features, part of speech feature extension, the Chinese character features and the Pinyin letters to create multi-feature semantic representation. Third, we introduced the new multi-semantic characteristics into the BiGRU to obtain the context semantics, which were processed with the multi-channel CNN to generate the main features. Finally, we merged these features for the softmax layer to finish the classification tasks, and predicted the required category labels. [Results] The accuracy of our multi-feature fusion model reached 83.3% and 91.1% with two datasets, which was 7% higher than the existing model. [Limitations] More research is needed to examine the model with larger datasets. [Conclusions] The proposed model could effectively finish the Chinese text classification tasks. © 2021 The Author(s).
引用
收藏
页码:1 / 14
页数:13
相关论文
共 37 条
  • [1] Wu Jiao, Hong Caifeng, Gu Yongchun, Et al., Class-wise Nearest Neighbor Dictionary based Linear Regression Model for Text Classification, Computer Engineering
  • [2] Fang Qiulian, Wang Peijin, Sui Yang, Et al., Parameter Optimization of Text Feature Vector of Naïve Bayesian Classifier, Journal of Jilin University (Science Edition), 57, 6, pp. 1479-1484, (2019)
  • [3] Kim Y., Convolutional Neural Networks for Sentence Classification[C], Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1746-1751, (2014)
  • [4] Kalchbrenner N, Grefenstette E, Blunsom P., A Convolutional Neural Network for Modelling Sentences, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 655-665, (2014)
  • [5] Johnson R, Zhang T., Deep Pyramid Convolutional Neural Networks for Text Categorization, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 562-570, (2017)
  • [6] Yu Bengong, Xu Qingtang, Zhang Peixing, Question Classification Based on MAC-LSTM, Application Research of Computers, 37, 1, pp. 40-43, (2020)
  • [7] Wang Haitao, Song Wen, Wang Hui, Text Classification Method Based on Hybrid Model of LSTM and CNN, Journal of Chinese Computer Systems, 41, 6, pp. 1163-1168, (2020)
  • [8] Yu Bengong, Zhang Peixing, WPOS-GRU Patent Classification Method Based on Two-channel Feature Fusion, Application Research of Computers, 37, 3, pp. 655-658, (2020)
  • [9] He Bo, Ma Jing, Li Chi, Research on Commodity Text Classification Based on Fusion Features, Information Studies: Theory & Application, 43, 11, pp. 162-168, (2020)
  • [10] Zheng Cheng, Xue Manyi, Hong Tongtong, Et al., DC-BiGRU_CNN Model for Short-text Classification, Computer Science, 46, 11, pp. 186-192, (2019)