MO-Transformer: Extract High-Level Relationship Between Words for Neural Machine Translation

被引：0

作者：

Duan, Sufeng ^{[1
]}

Zhao, Hai ^{[2
,3
,4
]}

机构：

[1] Shanghai Jiao Tong Univ, Sch Cyber Sci & Engn, Shanghai 200240, Peoples R China

[2] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China

[3] Shanghai Jiao Tong Univ, Key Lab, Shanghai Educ Commiss Intelligent Interact & Cogni, Shanghai 200240, Peoples R China

[4] Shanghai Key Lab Trusted Data Circulat & Governanc, Shanghai 200240, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2024年 / 32卷

基金：

中国国家自然科学基金;

关键词：

Encoding; Transformers; Syntactics; Bidirectional control; Linguistics; Context modeling; Training; Neural machine translation; Decoding; Semantics; Artificial intelligence; natural language processing; neural machine translation; transformer;

D O I：

10.1109/TASLP.2024.3507556

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we propose an explanation of representation for self-attention network (SAN) based neural sequence encoders, which regards the information captured by the model and the encoding of the model as graph structure and the generation of these graph structures respectively. The proposed explanation applies to existing works on SAN-based models and can explain the relationship among the ability to capture the structural or linguistic information, depth of model, and length of sentence, and can also be extended to other models such as recurrent neural network based models. We also propose a revisited multigraph called Multi-order-Graph (MoG) based on our explanation to model the graph structures in the SAN-based model as subgraphs in MoG and convert the encoding of the SAN-based model to the generation of MoG. Based on our explanation, we further introduce an MO-Transformer by enhancing the ability to capture multiple subgraphs of different orders and focusing on subgraphs of high orders. Experimental results on multiple neural machine translation tasks show that the MO-Transformer can yield effective performance improvement.

引用

页码：5065 / 5077

页数：13

共 39 条

[1]

[Anonymous], 2016, P 2016 C EMP METH NA, DOI DOI 10.18653/V1/D16-1159

[2] What do Neural Machine Translation Models Learn about Morphology? [J].

Belinkov, Yonatan ;

Durrani, Nadir ;

Dalvi, Fahim ;

Sajjad, Hassan ;

Glass, James .

PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, :861-872

[3]

Blevins T, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, P14

[4]

Coenen A, 2019, ADV NEUR IN, V32

[5]

Dai ZH, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P2978

[6]

Dehghani M., 2019, P INT C LE REPR

[7]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[8]

Gu J., 2018, INT C LEARN REPR

[9]

Guo JL, 2020, ADV NEUR IN, V33

[10] Theoretical Limitations of Self-Attention in Neural Sequence Models [J].

Hahn, Michael .

TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2020, 8 :156-171

← 1 2 3 4 →