Combining Gated Convolutional Networks and Self-Attention Mechanism for Speech Emotion Recognition

被引：2

作者：

Li, Chao ^{[1
]}

Jiao, Jinlong ^{[1
]}

Zhao, Yiqin ^{[1
]}

Zhao, Ziping ^{[1
]}

机构：

[1] Tianjin Normal Univ, Coll Comp & Informat Engn, Tianjin, Peoples R China

来源：

2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS (ACIIW) | 2019年

基金：

中国国家自然科学基金;

关键词：

speech emotion recognition; gated mechanism; attention mechanism; convolutional neural network;

D O I：

10.1109/aciiw.2019.8925283

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Discrete speech emotion recognition (SER), the assignment of a single emotion label to an entire speech utterance, is typically performed as a sequence-to-label task. The predominant approach to SER to date is based on recurrent neural networks. Their success on this task is often linked to their ability to capture unbounded context. In this paper we introduce new gated convolutional networks and apply them to SER, which can be more efficient since they allow parallelization over sequential tokens. We present a novel model architecture that incorporates a gated convolutional neural network and a temporal attention-based localization method for speech emotion recognition. To the best of the authors' knowledge, this is the first time that such a hybrid architecture is employed for SER. We demonstrate the effectiveness of our approach on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpus. The experimental results demonstrate that our proposed model outperforms current state-of-the-art approaches.

引用

页码：105 / 109

页数：5

共 22 条

[1]

[Anonymous], 2015, Arxiv.Org, DOI DOI 10.3389/FPSYG.2013.00124

[2]

Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473

[3]

Boulanger-Lewandowski N., 2012, ARXIV PREPRINT ARX I

[4] IEMOCAP: interactive emotional dyadic motion capture database [J].

Busso, Carlos ;

Bulut, Murtaza ;

Lee, Chi-Chun ;

Kazemzadeh, Abe ;

Mower, Emily ;

Kim, Samuel ;

Chang, Jeannette N. ;

Lee, Sungbok ;

Narayanan, Shrikanth S. .

LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359

[5] HD Live Maps for Automated Driving: An AI Approach [J].

Chen, Xin .

26TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2018), 2018, :1-1

[6]

Chernykh V., 2017, ARXIV PREPRINT ARX I

[7]

Chorowski J, 2015, ADV NEUR IN, V28

[8]

Dauphin YN, 2017, PR MACH LEARN RES, V70

[9] Recent advances in convolutional neural networks [J].

Gu, Jiuxiang ;

Wang, Zhenhua ;

Kuen, Jason ;

Ma, Lianyang ;

Shahroudy, Amir ;

Shuai, Bing ;

Liu, Ting ;

Wang, Xingxing ;

Wang, Gang ;

Cai, Jianfei ;

Chen, Tsuhan .

PATTERN RECOGNITION, 2018, 77 :354-377

[10]

Hochreiter S, 1997, Neural Computation, V9, P1735

← 1 2 3 →