Data Selection via Semi-supervised Recursive Autoencoders for SMT Domain Adaptation

被引:0
作者
Lu, Yi [1 ]
Wong, Derek F. [1 ]
Chao, Lidia S. [1 ]
Wang, Longyue [1 ]
机构
[1] Univ Macau, Dept Comp & Informat Sci, Nat Language Proc & Portuguese Chinese Machine Tr, Macau, Peoples R China
来源
MACHINE TRANSLATION, CWMT 2014 | 2014年 / 493卷
关键词
Statistical Machine Translation; Domain Adaptation; Data Selection; Semi-Supervise; Recursive Autoencoders;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a novel data selection approach based on semi-supervised recursive autoencoders. The model is trained to capture the domain specific features and used for detecting sentences, which are relevant to a specific domain, from a large general-domain corpus. The selected data are used for adapting the built language model and translation model to target domain. Experiments were conducted on an in-domain (IWSLT2014 Chinese-English TED Talk) and a general-domain corpus (UM-Corpus). We evaluated the proposed data selection model in both intrinsic and extrinsic evaluations to investigate the selection successful rate (F-score) of pseudo data, as well as the translation quality (BLEU score) of adapting SMT systems. Empirical results reveal the proposed approach outperforms the state-of-the-art selection approach.
引用
收藏
页码:13 / 23
页数:11
相关论文
共 26 条
[1]  
[Anonymous], P REC ADV NAT LANG P
[2]  
[Anonymous], 2012, P 16 EAMT C TRENT IT
[3]  
[Anonymous], P 2 CIPS SIGHAN JOIN
[4]  
[Anonymous], 2002, ACM Transactions on Asian Language Information Processing
[5]  
[Anonymous], P 9 WORKSH STAT MACH
[6]  
[Anonymous], 2011, P 2011 C EMP METH NA
[7]  
[Anonymous], P 20 INT C COMP LING
[8]  
Brown P. F., 1993, Computational Linguistics, V19, P263
[9]  
Dyer C., 2013, Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, NAACL, P644
[10]  
Hildebrand A.S., 2005, Proceedings of the Tenth Annual Conference of the European Assocation for Machine Translation, P133