Design of a Modified Transformer Architecture Based on Relative Position Coding

被引：56

作者：

Zheng, Wenfeng ^{[1
]}

Gong, Gu ^{[1
]}

Tian, Jiawei ^{[1
]}

Lu, Siyu ^{[1
]}

Wang, Ruiyang ^{[1
]}

Yin, Zhengtong ^{[2
]}

Li, Xiaolu ^{[3
]}

Yin, Lirong ^{[4
]}

机构：

[1] Univ Elect Sci & Technol China, Sch Automat, Chengdu 610054, Peoples R China

[2] Guizhou Univ, Coll Resource & Environm Engn, Guiyang 550025, Peoples R China

[3] Southwest Univ, Sch Geog Sci, Chongqing 400715, Peoples R China

[4] Louisiana State Univ, Dept Geog & Anthropol, Baton Rouge, LA 70803 USA

来源：

INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS | 2023年 / 16卷 / 01期

关键词：

Relative position embedding; Natural language processing; Attention mechanism; ATTENTION; LSTM;

D O I：

10.1007/s44196-023-00345-z

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Natural language processing (NLP) based on deep learning provides a positive performance for generative dialogue system, and the transformer model is a new boost in NLP after the advent of word vectors. In this paper, a Chinese generative dialogue system based on transformer is designed, which only uses a multi-layer transformer decoder to build the system and uses the design of an incomplete mask to realize one-way language generation. That is, questions can perceive context information in both directions, while reply sentences can only output one-way autoregressive. The above system improvements make the one-way generation of dialogue tasks more logical and reasonable, and the performance is better than the traditional dialogue system scheme. In consideration of the long-distance information weakness of absolute position coding, we put forward the improvement of relative position coding in theory, and verify it in subsequent experiments. In the transformer module, the calculation formula of self-attention is modified, and the relative position information is added to replace the absolute position coding of the position embedding layer. The performance of the modified model in BLEU, embedding average, grammatical and semantic coherence is ideal, to enhance long-distance attention.

引用

页数：17

共 62 条

[1] BoW-based neural networks vs. cutting-edge models for single-label text classification [J].

Abdalla, Hassan I. ;

Amer, Ali A. ;

Ravana, Sri Devi .

NEURAL COMPUTING & APPLICATIONS, 2023, 35 (27) :20103-20116

[2] On the integration of similarity measures with machine learning models to enhance text classification performance [J].

Abdalla, Hassan I. ;

Amer, Ali A. .

INFORMATION SCIENCES, 2022, 614 :263-288

[3] Boosting the Item-Based Collaborative Filtering Model with Novel Similarity Measures [J].

Abdalla, Hassan I. I. ;

Amer, Ali A. ;

Amer, Yasmeen A. ;

Nguyen, Loc ;

Al-Maqaleh, Basheer .

INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2023, 16 (01)

[4] Enhancing recommendation systems performance using highly-effective similarity measures® [J].

Amer, Ali A. ;

Abdalla, Hassan, I ;

Loc Nguyen .

KNOWLEDGE-BASED SYSTEMS, 2021, 217

[5]

[Anonymous], 2005, P ACL WORKSH EMP MOD, DOI [DOI 10.3115/1631862.1631865, 10.3115/1631862.1631865]

[6] A modified weighted chimp optimization algorithm for training feed-forward neural network [J].

Atta, Eman A. ;

Ali, Ahmed F. ;

Elshamy, Ahmed A. .

PLOS ONE, 2023, 18 (03)

[7]

Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473]

[8]

Britz D., 2017, ARXIV

[9] A novel U-shaped encoder-decoder network with attention mechanism for detection and evaluation of road cracks at pixel level [J].

Chen, Jun ;

He, Ye .

COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2022, 37 (13) :1721-1736

[10]

Chorowski J, 2015, ADV NEUR IN, V28

← 1 2 3 4 5 6 7 →