G-Transformer for Document-level Machine Translation

被引：0

作者：

Bao, Guangsheng ^{[1
,2
]}

Zhang, Yue ^{[1
,2
]}

Teng, Zhiyang ^{[1
,2
]}

Chen, Boxing ^{[3
]}

Luo, Weihua ^{[3
]}

机构：

[1] Westlake Univ, Sch Engn, Hangzhou, Peoples R China

[2] Westlake Inst Adv Study, Inst Adv Technol, Hangzhou, Peoples R China

[3] Alibaba Grp Inc, DAMO Acad, Hangzhou, Peoples R China

来源：

59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1 | 2021年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Document-level MT models are still far from satisfactory. Existing work extend translation unit from single sentence to multiple sentences. However, study shows that when we further enlarge the translation unit to a whole document, supervised training of Transformer can fail. In this paper, we find such failure is not caused by overfitting, but by sticking around local minima during training. Our analysis shows that the increased complexity of target-to-source attention is a reason for the failure. As a solution, we propose G-Transformer, introducing locality assumption as an inductive bias into Transformer, reducing the hypothesis space of the attention from target to source. Experiments show that G-Transformer converges faster and more stably than Transformer, achieving new state-of-the-art BLEU scores for both nonpretraining and pre-training settings on three benchmark datasets.

引用

页码：3442 / 3455

页数：14

共 46 条

[1] Agrawal R., 2018, P 21 ANN C EUROPEAN, P11
[2] [Anonymous], 2019, NAACL HLT 2019 2019
[3] [Anonymous], 2015, P 18 ANN C EUR ASS M, DOI DOI 10.1007/978-3-319-16232-4_4
[4] [Anonymous], 2007, Proceedings of the 45th Annual Meeting of the ACL, demonstration session
[5] [Anonymous], 1996, Proceedings of the 16th International Conference on Computational Linguistics, DOI DOI 10.3115/993268.993313
[6] [Anonymous], 2005, Proceedings of ACL
[7] Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[8] Bao GS, 2021, AAAI CONF ARTIF INTE, V35, P12544
[9] Beltagy I., 2020, Longformer: the long-document transformer, V2004, P05150, DOI DOI 10.48550/ARXIV.2004.05150
[10] Bowman S. R., 2016, P 20 SIGNLL C COMP N, P10, DOI 10.18653/v1/K16-1002

← 1 2 3 4 5 →