Investigating Capsule Network and Semantic Feature on Hyperplanes for Text Classification

被引:0
作者
Du, Chunning [1 ,2 ]
Sun, Haifeng [1 ,2 ]
Wang, Jingyu [1 ,2 ]
Qi, Qi [1 ,2 ]
Liao, Jianxin [1 ,2 ]
Wang, Chun [1 ,2 ]
Ma, Bing [1 ,2 ]
机构
[1] Beijing Univ Posts & Telecommun, State Key Lab Networking & Switching Technol, Beijing 100876, Peoples R China
[2] EBUPT Informat Technol Co Ltd, Beijing 100191, Peoples R China
来源
2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE | 2019年
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As an essential component of natural language processing, text classification relies on deep learning in recent years. Various neural networks are designed for text classification on the basis of word embedding. However, polysemy is a fundamental feature of the natural language, which brings challenges to text classification. One polysemic word contains more than one sense, while the word embedding procedure conflates different senses of a polysemic word into a single vector. Extracting the distinct representation for the specific sense could thus lead to fine-grained models with strong generalization ability. It has been demonstrated that multiple senses of a word actually reside in linear superposition within the word embedding so that specific senses can be extracted from the original word embedding. Therefore, we propose to use capsule networks to construct the vectorized representation of semantics and utilize hyperplanes to decompose each capsule to acquire the specific senses. A novel dynamic routing mechanism named 'routing-on-hyperplane' will select the proper sense for the downstream classification task. Our model is evaluated on 6 different datasets, and the experimental results show that our model is capable of extracting more discriminative semantic features and yields a significant performance gain compared to other baseline methods.
引用
收藏
页码:456 / 465
页数:10
相关论文
共 28 条
[1]  
Aggarwal CharuC., 2012, MINING TEXT DATA, DOI 10.1007/978-1-4614-3223-4_6
[2]  
[Anonymous], 2012, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2, ACL '12
[3]  
Arora Sanjeev, 2018, TACL
[4]  
Chen Wenqing, 2018, P 2018 C EMPIRICAL M
[5]  
Conneau A, 2017, 15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, P1107
[6]  
Hinton GE., 2018, P INT C LEARN REPR I
[7]  
Hinton GE, 2011, LECT NOTES COMPUT SC, V6791, P44, DOI 10.1007/978-3-642-21735-7_6
[8]   Deep Pyramid Convolutional Neural Networks for Text Categorization [J].
Johnson, Rie ;
Zhang, Tong .
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, :562-570
[9]  
Kim Y., 2014, ARXIV14085882, P1, DOI [10.3115/v1/D14-1181, DOI 10.3115/V1/D14-1181]
[10]  
Kingma DP, 2014, ADV NEUR IN, V27