Towards Better Word Alignment in Transformer

被引:5
作者
Song, Kai [1 ,2 ]
Zhou, Xiaoqing [1 ]
Yu, Heng [3 ]
Huang, Zhongqiang [3 ]
Zhang, Yue [4 ]
Luo, Weihua [3 ]
Duan, Xiangyu [1 ]
Zhang, Min [1 ]
机构
[1] Soochow Univ, Suzhou 215000, Peoples R China
[2] DAMO Acad, Hangzhou 310051, Peoples R China
[3] Alibaba DAMO Acad, Hangzhou 310051, Peoples R China
[4] Westlake Univ, Hangzhou 310000, Peoples R China
基金
中国国家自然科学基金;
关键词
Decoding; Data models; Training; Context modeling; Standards; Speech processing; Error analysis; Neural network; neural machine translation; Transformer; word alignment; language model pre-training; alignment concentration;
D O I
10.1109/TASLP.2020.2998278
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
While neural models based on the Transformer architecture achieve the State-of-the-Art translation performance, it is well known that the learned target-to-source attentions do not correlate well with word alignment. There is an increasing interest in inducing accurate word alignment in Transformer, due to its important role in practical applications such as dictionary-guided translation and interactive translation. In this article, we extend and improve the recent work on unsupervised learning of word alignment in Transformer on two dimensions: a) parameter initialization from a pre-trained cross-lingual language model to leverage large amounts of monolingual data for learning robust contextualized word representations, and b) regularization of the training objective to directly model characteristics of word alignments which results in favorable word alignments receiving more concentrated probabilities. Experiments on benchmark data sets of three language pairs show that the proposed methods can significantly reduce alignment error rate (AER) by at least 3.7 to 7.7 points on each language pair over two recent works on improving the Transformer's word alignment. Moreover, our methods can achieve better alignment results than GIZA++ on certain test sets.
引用
收藏
页码:1801 / 1812
页数:12
相关论文
共 50 条
[21]   AbFTNet: An Efficient Transformer Network with Alignment before Fusion for Multimodal Automatic Modulation Recognition [J].
Ning, Meng ;
Zhou, Fan ;
Wang, Wei ;
Wang, Shaoqiang ;
Zhang, Peiying ;
Wang, Jian .
ELECTRONICS, 2024, 13 (18)
[22]   Transformer Fuse Sizing-The NEC is not the Last Word [J].
Ventruella, Del John .
IEEE TRANSACTIONS ON INDUSTRY APPLICATIONS, 2019, 55 (02) :2173-2180
[23]   Linguistics-based word alignment for medical translators [J].
Vanallemeersch, Tom ;
Wermuth, Cornelia .
JOURNAL OF SPECIALISED TRANSLATION, 2008, (09) :20-38
[24]   A Hybrid Approach for Word Alignment with Statistical Modeling and Chunker [J].
Srivastava, Jyoti ;
Sanyal, Sudip .
COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT I, 2015, 9041 :570-581
[25]   POS-based Word Alignment for Small Corpus [J].
Srivastava, Jyoti ;
Sanyal, Sudip .
PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2015, :37-40
[26]   Constraining a Generative Word Alignment Model with Discriminative Output [J].
Goh, Chooi-Ling ;
Watanabe, Taro ;
Yamamoto, Hirofumi ;
Sumita, Eiichiro .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (07) :1976-1983
[27]   Bootstrapping Word Alignment by automatically Generated Bilingual Dictionary [J].
Zhu, Danqing ;
Chang, Baobao .
IEEE NLP-KE 2008: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2008, :311-317
[28]   WORD ALIGNMENT BASED ON MULTI-GRAIN MODEL [J].
He, Yanqing ;
Zhou, Yu ;
Zong, Chengqing .
2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, :269-272
[29]   Unsupervised joint monolingual character alignment and word segmentation [J].
Teng, Zhiyang ;
Xiong, Hao ;
Liu, Qun .
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8801 :1-12
[30]   Word Alignment for English-Turkish Language Pair [J].
Cakmak, M. Talha ;
Acar, Suleyman ;
Eryigit, Gulsen .
LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, :2177-2180