Emotion-Aware Transformer Encoder for Empathetic Dialogue Generation

被引：5

作者：

Goel, Raman ^{[1
]}

Susan, Seba ^{[1
]}

Vashisht, Sachin ^{[1
]}

Dhanda, Armaan ^{[1
]}

机构：

[1] Delhi Technol Univ, Dept Informat Technol, New Delhi 110042, India

来源：

2021 9TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS (ACIIW) | 2021年

关键词：

Transformer-XL; Empathetic dialogue generation; Affective state; Encoder-decoder model;

D O I：

10.1109/ACIIW52867.2021.9666315

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Modern day conversational agents are trained to emulate the manner in which humans communicate. To emotionally bond with the user, these virtual agents need to be aware of the affective state of the user. Transformers are the recent state of the art in sequence-to-sequence learning that involves training an encoder-decoder model with word embeddings from utterance-response pairs. We propose an emotion-aware transformer encoder for capturing the emotional quotient in the user utterance in order to generate human-like empathetic responses. The contributions of our paper are as follows: 1) An emotion detector module trained on the input utterances determines the affective state of the user in the initial phase 2) A novel transformer encoder is proposed that adds and normalizes the word embedding with emotion embedding thereby integrating the semantic and affective aspects of the input utterance 3) The encoder and decoder stacks belong to the Transformer-XL architecture which is the recent state of the art in language modeling. Experimentation on the benchmark Facebook AI empathetic dialogue dataset confirms the efficacy of our model from the higher BLEU-4 scores achieved for the generated responses as compared to existing methods. Emotionally intelligent virtual agents are now a reality and inclusion of affect as a modality in all human-machine interfaces is foreseen in the immediate future.

引用

页数：6

共 31 条

[1]

Al-Rfou R, 2019, AAAI CONF ARTIF INTE, P3159

[2]

[Anonymous], 2015, P 2015 C EMP METH NA, DOI [10.18653/v1/d15-1166, DOI 10.18653/V1/D15-1166]

[3]

Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, DOI 10.48550/ARXIV.1409.0473]

[4] IEMOCAP: interactive emotional dyadic motion capture database [J].

Busso, Carlos ;

Bulut, Murtaza ;

Lee, Chi-Chun ;

Kazemzadeh, Abe ;

Mower, Emily ;

Kim, Samuel ;

Chang, Jeannette N. ;

Lee, Sungbok ;

Narayanan, Shrikanth S. .

LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359

[5]

Cho K., 2014, P 8 WORKSH SYNT SEM, DOI [10.3115/v1/W14-4012, DOI 10.3115/V1/W14-4012]

[6]

Dai ZH, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P2978

[7]

Devillers L, 2006, INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, P801

[8]

Dino F, 2019, IEEE INT C INT ROBOT, P2089, DOI [10.1109/iros40897.2019.8968576, 10.1109/IROS40897.2019.8968576]

[9]

Gehring J, 2017, PR MACH LEARN RES, V70

[10]

Goel Raman, 2021, 2021 9 INT C AFFECTI, P1

← 1 2 3 4 →