A tree does not make a well-formed sentence: Improving syntactic string-to-tree statistical machine translation with more linguistic knowledge

被引:1
|
作者
Sennrich, Rico [1 ]
Williams, Philip [1 ]
Huck, Matthias [1 ]
机构
[1] Univ Edinburgh, Sch Informat, Edinburgh EH8 9AB, Midlothian, Scotland
来源
COMPUTER SPEECH AND LANGUAGE | 2015年 / 32卷 / 01期
基金
瑞士国家科学基金会;
关键词
Statistical machine translation; Syntactic translation models; String-to-tree models; Morphology;
D O I
10.1016/j.csl.2014.09.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Synchronous context-free grammars (SCFGs) can be learned from parallel texts that are annotated with target-side syntax, and can produce translations by building target-side syntactic trees from source strings. Ideally, producing syntactic trees would entail that the translation is grammatically well-formed, but in reality, this is often not the case. Focusing on translation into German, we discuss various ways in which string-to-tree translation models over- or undergeneralise. We show how these problems can be addressed by choosing a suitable parser and modifying its output, by introducing linguistic constraints that enforce morphological agreement and constrain subcategorisation, and by modelling the productive generation of German compounds. (C) 2014 The Authors. Published by Elsevier Ltd.
引用
收藏
页码:27 / 45
页数:19
相关论文
empty
未找到相关数据