Towards Better Word Alignment in Transformer

被引：5

作者：

Song, Kai ^{[1
,2
]}

Zhou, Xiaoqing ^{[1
]}

Yu, Heng ^{[3
]}

Huang, Zhongqiang ^{[3
]}

Zhang, Yue ^{[4
]}

Luo, Weihua ^{[3
]}

Duan, Xiangyu ^{[1
]}

Zhang, Min ^{[1
]}

机构：

[1] Soochow Univ, Suzhou 215000, Peoples R China

[2] DAMO Acad, Hangzhou 310051, Peoples R China

[3] Alibaba DAMO Acad, Hangzhou 310051, Peoples R China

[4] Westlake Univ, Hangzhou 310000, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2020年 / 28卷

基金：

中国国家自然科学基金;

关键词：

Decoding; Data models; Training; Context modeling; Standards; Speech processing; Error analysis; Neural network; neural machine translation; Transformer; word alignment; language model pre-training; alignment concentration;

D O I：

10.1109/TASLP.2020.2998278

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

While neural models based on the Transformer architecture achieve the State-of-the-Art translation performance, it is well known that the learned target-to-source attentions do not correlate well with word alignment. There is an increasing interest in inducing accurate word alignment in Transformer, due to its important role in practical applications such as dictionary-guided translation and interactive translation. In this article, we extend and improve the recent work on unsupervised learning of word alignment in Transformer on two dimensions: a) parameter initialization from a pre-trained cross-lingual language model to leverage large amounts of monolingual data for learning robust contextualized word representations, and b) regularization of the training objective to directly model characteristics of word alignments which results in favorable word alignments receiving more concentrated probabilities. Experiments on benchmark data sets of three language pairs show that the proposed methods can significantly reduce alignment error rate (AER) by at least 3.7 to 7.7 points on each language pair over two recent works on improving the Transformer's word alignment. Moreover, our methods can achieve better alignment results than GIZA++ on certain test sets.

引用

页码：1801 / 1812

页数：12

共 50 条

[1] Transformer Machine Translation Model Incorporating Word Alignment Structure
Xi, Haixu
Zhang, Feng
Wang, Yintong
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (01) : 1010 - 1019
[2] Word Alignment Based Transformer Model for XML Structured Documentation Translation
An, Jing
Tang, Yecheng
Bai, Yanbing
Li, Jiyi
DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2022, PT I, 2022, 13426 : 316 - 322
[3] A Smaller and Better Word Embedding for Neural Machine Translation
Chen, Qi
IEEE ACCESS, 2023, 11 : 40770 - 40778
[4] Simpler Is Better: Re-evaluation of Default Word Alignment Models in Statistical MT
Fishel, Mark
PROCEEDINGS OF THE 24TH PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, 2010, : 381 - 388
[5] Guidelines for word alignment evaluation and manual alignment
Lambert, Patrik
De Gispert, Adria
Banchs, Rafael
Marino, Jose B.
LANGUAGE RESOURCES AND EVALUATION, 2005, 39 (04) : 267 - 285
[6] Guidelines for Word Alignment Evaluation and Manual Alignment
Patrik Lambert
Adrià De Gispert
Rafael Banchs
José B. Mariño
Language Resources and Evaluation, 2005, 39 : 267 - 285
[7] P-Transformer: Towards Better Document-to-Document Neural Machine Translation
Li, Yachao
Li, Junhui
Jiang, Jing
Tao, Shimin
Yang, Hao
Zhang, Min
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3859 - 3870
[8] On Complex Word Alignment Configurations
Kaeshammer, Miriam
Westburg, Anika
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1773 - 1780
[9] Discriminative Word Alignment over Multiple Word Segmentations
XI Ning
DAI Xinyu
HUANG Shujian
CHEN Jiajun
ChineseJournalofElectronics, 2014, 23 (02) : 263 - 270
[10] Discriminative Word Alignment over Multiple Word Segmentations
Xi Ning
Dai Xinyu
Huang Shujian
Chen Jiajun
CHINESE JOURNAL OF ELECTRONICS, 2014, 23 (02) : 263 - 270

← 1 2 3 4 5 →