Latent-Variable Generative Models for Data-Efficient Text Classification

被引:0
作者
Ding, Xiaoan [1 ]
Gimpel, Kevin [2 ]
机构
[1] Univ Chicago, Chicago, IL 60637 USA
[2] Toyota Technol Inst Chicago, Chicago, IL 60637 USA
来源
2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE | 2019年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Generative classifiers offer potential advantages over their discriminative counterparts, namely in the areas of data efficiency, robustness to data shift and adversarial examples, and zero-shot learning (Ng and Jordan, 2002; Yogatama et al., 2017; Lewis and Fan, 2019). In this paper, we improve generative text classifiers by introducing discrete latent variables into the generative story, and explore several graphical model configurations. We parameterize the distributions using standard neural architectures used in conditional language modeling and perform learning by directly maximizing the log marginal likelihood via gradient-based optimization, which avoids the need to do expectation-maximization. We empirically characterize the performance of our models on six text classification datasets. The choice of where to include the latent variable has a significant impact on performance, with the strongest results obtained when using the latent variable as an auxiliary conditioning variable in the generation of the textual input. This model consistently outperforms both the generative and discriminative classifiers in small-data settings. We analyze our model by using it for controlled generation, finding that the latent variable captures interpretable properties of the data, even with very small training sets.
引用
收藏
页码:507 / 517
页数:11
相关论文
共 44 条
[1]  
[Anonymous], 2018, P 2018 C N AM CHAPTE
[2]  
[Anonymous], 2009, ICML
[3]   An algorithm that learns what's in a name [J].
Bikel, DM ;
Schwartz, R ;
Weischedel, RM .
MACHINE LEARNING, 1999, 34 (1-3) :211-231
[4]  
BLACK E, 1993, 31ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P31
[5]  
Blunsom Phil, 2008, ACL 2008, Proceedings of the 46th AnnualMeeting of the Association for Computational Linguistics, June 15-20, 2008, Columbus, Ohio, P200
[6]  
Brants T, 2000, 6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, P224
[7]  
Burda Yuri, 2016, P INT C LEARNING REP
[8]  
Chung J, 2015, ADV NEUR IN, V28
[9]  
Church K. W., 1988, Second Conference on Applied Natural Language Processing, P136
[10]  
Collins M, 1997, 35TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 8TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P16