Multi-label Emotion Classification for Microblog Based on CNN Feature Space

被引:0
|
作者
Sun S. [1 ]
He Y. [1 ,2 ]
机构
[1] School of Computer, Wuhan Univ., Wuhan
[2] State Key Lab. of Software Eng., Wuhan Univ., Wuhan
来源
| 1600年 / Sichuan University卷 / 49期
关键词
Convolution neural network; Emotion classification; Multi-label classification; Semantic compositionality; Word embedding;
D O I
10.15961/j.jsuese.201600780
中图分类号
学科分类号
摘要
While the evaluation task of microblog emotion is a multi-label classification problem, the traditional text representing methods, which are usually based on vector space model, fail to provide more effective semantic features.Word embedding technology is based on deep learning, which can well capture the syntax and semantic relations between words, and build sentence representing effectively according to semantic compositionality.A multi-label emotion classification system was proposed.First, word embedding for Chinese words was learned from a large scale of unlabeled Chinese microblog text dataset.Second, the Convolution Neural Network (CNN) model was exploited to train a supervised multi-emotion classifier.Third, the learned CNN model was used to composite the feature vector for sentences from microblog.At last, these sentence vectors were treated as semantic features to train the multi-label classifier, which was used to finish the multi-label emotion classification for microblog.Based on the open dataset from microblog emotion evaluation task of NLPCC (Natural Language Processing and Chinese Computing) conference in 2013, the best performance of the proposed system achieved 19.16% and 17.75% improvement in the loose metric and the strict metric, respectively, comparing to the best performance of all the evaluation results.The state-of-art performance, which was achieved by the method of exploiting Recursive Neural Tensor Network model to composite the sentence vector, was also outperformed by the proposed system up to 3.66% and 2.89% on the two metrics.Several multi-label classifiers were employed to compare different feature representing methods, and the sentence vectors based CNN feature space were showed to have the most discriminative emotion semantic.The pattern recognition in the semantic composition procedure was showed by analyzing the training iteration of CNN model. © 2017, Editorial Department of Advanced Engineering Sciences. All right reserved.
引用
收藏
页码:162 / 169
页数:7
相关论文
共 26 条
  • [1] Zhao Y., Qin B., Liu T., Sentiment analysis, Journal of Software, 21, 8, pp. 1834-1848, (2010)
  • [2] Pak A., Paroubek P., Twitter as a corpus for sentiment analysis and opinion mining, Proceedings of the International Conference on Language Resources and Evaluation, pp. 1320-1326, (2010)
  • [3] Kim S., Hovy E., Automatic identification of pro and con reasons in online reviews, Proceedings of the COLING/ ACL on Main Conference Poster Sessions, pp. 483-490, (2006)
  • [4] Chen T., Xu R., Lu Q., Et al., A sentence vector based oversampling method for imbalanced emotion classification, Proceedings of 15th International Conference on Computational Linguistics and Intelligent Text Processing, pp. 62-72, (2014)
  • [5] Xu R., Chen T., Xia Y., Et al., Word embedding composition for data imbalances in sentiment and emotion classification, Cognitive Computation, 7, 2, pp. 226-240, (2015)
  • [6] Xu R., Wang Z., Xu J., Et al., An iterative emotion classification approach for microblogs, Proceedings of 16th International Conference on Computational Linguistics and Intelligent Text Processing, pp. 104-113, (2015)
  • [7] Cambria E., Schuller B., Xia Y., Et al., New avenues in opinion mining and sentiment analysis, IEEE Intelligent Systems, 28, 2, pp. 15-21, (2013)
  • [8] Salton G., Wong A., Yang C., A vector space model for automatic indexing, Communications of the ACM, 18, 11, pp. 613-620, (1975)
  • [9] Mikolov T., Sutskever I., Chen K., Et al., Distributed representations of words and phrases and their compositionality, Proceedings of Advances in Neural Information Processing Systems 26, pp. 3111-3119, (2013)
  • [10] Mitchell J., Lapata M., Composition in distributional models of semantics, Cognitive Science, 34, 8, pp. 1388-1429, (2010)