Low-resource text classification using domain-adversarial learning

被引:11
作者
Griesshaber, Daniel [1 ]
Ngoc Thang Vu [2 ]
Maucher, Johannes [1 ]
机构
[1] Stuttgart Media Univ, Nobelstr 10, D-70569 Stuttgart, Germany
[2] Univ Stuttgart, Inst Nat Language Proc IMS, Pfaffenwaldring 5b, D-70569 Stuttgart, Germany
关键词
NLP; Low-resource; Deep learning; Domain-adversarial;
D O I
10.1016/j.csl.2019.101056
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning techniques have recently shown to be successful in many natural language processing tasks forming state-of-the-art systems. They require, however, a large amount of annotated data which is often missing. This paper explores the use of domain-adversarial learning as a regularizer to avoid overfitting when training domain invariant features for deep, complex neural networks in low-resource and zero-resource settings in new target domains or languages. In case of new languages, we show that monolingual word vectors can be directly used for training without prealignment. Their projection into a common space can be learnt ad-hoc at training time reaching the final performance of pretrained multilingual word vectors. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页数:11
相关论文
共 52 条
[1]  
Aggarwal CharuC., 2012, MINING TEXT DATA, DOI DOI 10.1007/978-1-4614-3223-4.6
[2]  
Agi Z., 2016, T ASS COMPUTATIONAL, V4, P301, DOI [DOI 10.1162/TACLA00100, DOI 10.1162/TACL_A_00100, 10.1162/tacl_a_00100]
[3]  
[Anonymous], 14085882 ARXIV
[4]  
[Anonymous], 160201925 ARXIV
[5]  
[Anonymous], P LEARN TEST CAT AAA
[6]  
[Anonymous], 2008, Fundamental Principles of Mathematical Sciences
[7]  
[Anonymous], 2017, P INT C LEARN REPR
[8]  
[Anonymous], 2012, Mining text data, DOI DOI 10.1007/978-1-4614-3223-4_13
[9]  
[Anonymous], USING BILINGUAL KNOW
[10]  
[Anonymous], 2018, Transactions of the Association for Computational Linguistics, DOI DOI 10.1162/TACL_A_00039