Leveraging bilingual terminology to improve machine translation in a CAT environment

被引:11
作者
Arcan, Mihael [1 ]
Turchi, Marco [2 ]
Tonelli, Sara [2 ]
Buitelaar, Paul [1 ]
机构
[1] Natl Univ Ireland, Insight Ctr Data Analyt, Galway, Ireland
[2] FBK, Via Sommarive 18, I-38123 Trento, Italy
基金
爱尔兰科学基金会;
关键词
Computational linguistics - Machine translation - Natural language processing systems - Speech transmission - Computer aided language translation - Medical information systems - XML;
D O I
10.1017/S1351324917000195
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work focuses on the extraction and integration of automatically aligned bilingual terminology into a Statistical Machine Translation (SMT) system in a Computer Aided Translation scenario. We evaluate the proposed framework that, taking as input a small set of parallel documents, gathers domain-specific bilingual terms and injects them into an SMT system to enhance translation quality. Therefore, we investigate several strategies to extract and align terminology across languages and to integrate it in an SMT system. We compare two terminology injection methods that can be easily used at run-time without altering the normal activity of an SMT system: XML markup and cache-based model. We test the cache-based model on two different domains (information technology and medical) in English, Italian and German, showing significant improvements ranging from 2.23 to 6.78 BLEU points over a baseline SMT system and from 0.05 to 3.03 compared to the widely-used XML markup approach.
引用
收藏
页码:763 / 788
页数:26
相关论文
共 46 条
[21]  
Denkowski M., 2014, P 14 C EUR CHAPT ASS, P395, DOI DOI 10.3115/V1/E14-1042
[22]   MEASURES OF THE AMOUNT OF ECOLOGIC ASSOCIATION BETWEEN SPECIES [J].
DICE, LR .
ECOLOGY, 1945, 26 (03) :297-302
[23]  
Ehrmann M, 2011, P INT C REC ADV NAT, P118
[24]  
Federico Marcelo., 2012, Proceedings of the Tenth Conference of the Association for Machine Translation in the Americas (AMTA), P44
[25]  
Green S., 2013, P SIGCHI C HUM FACT, P439, DOI [DOI 10.1145/2470654.2470718, 10.1145/2470654.2470718]
[26]  
Itagaki M, 2008, SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, P1584
[27]  
Kim S.N., 2009, AUSTRALASIAN LANGUAG, P94
[28]  
Kim S.N., 2010, P 5 INT WORKSH SEM E, P21
[29]  
Koehn P., 2007, ACL
[30]  
Laubli S., 2013, P MT SUMM 14 WORKSH, P83, DOI DOI 10.5167/UZH-80891