Self-attention transfer networks for speech emotion recognition

被引：3

作者：

Ziping ZHAO ^{[1
]}

Keru Wang ^{[1
]}

Zhongtian BAO ^{[1
]}

Zixing ZHANG ^{[2
]}

Nicholas CUMMINS ^{[3
,4
]}

Shihuang SUN ^{[5
]}

Haishuai WANG ^{[5
]}

Jianhua TAO ^{[6
]}

Bj?rn W.SCHULLER ^{[1
,2
,3
]}

机构：

[1] College of Computer and Information Engineering, Tianjin Normal University

[2] GLAM-Group on Language, Audio & Music, Imperial College London

[3] Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg

[4] Department of Biostatistics and Health Informatics, Io PPN, King's College London

[5] Department of Computer Science and Engineering, Fairfield University

[6] National Laboratory of Pattern Recognition,CASIA

来源：

虚拟现实与智能硬件(中英文) | 2021年 / 3卷 / 01期

基金：

欧盟地平线“2020”; 中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TN912.34 [语音识别与设备];

学科分类号：

摘要：

Background A crucial element of human-machine interaction, the automatic detection of emotional states from human speech has long been regarded as a challenging task for machine learning models. One vital challenge in speech emotion recognition(SER) is learning robust and discriminative representations from speech. Although machine learning methods have been widely applied in SER research, the inadequate amount of available annotated data has become a bottleneck impeding the extended application of such techniques(e. g., deep neural networks). To address this issue, we present a deep learning method that combines knowledge transfer and self-attention for SER tasks. Herein, we apply the log-Mel spectrogram with deltas and delta-deltas as inputs. Moreover, given that emotions are timedependent, we apply temporal convolutional neural networks to model the variations in emotions. We further introduce an attention transfer mechanism, which is based on a self-attention algorithm to learn long-term dependencies. The self-attention transfer network(SATN) in our proposed approach takes advantage of attention transfer to learn attention from speech recognition, followed by transferring this knowledge into SER. An evaluation built on Interactive Emotional Dyadic Motion Capture(IEMOCAP)dataset demonstrates the effectiveness of the proposed model.

引用

页码：43 / 54

页数：12

共 50 条

[21] Region Adaptive Self-Attention for an Accurate Facial Emotion Recognition
Lee, Seongmin
Lee, Jeonghaeng
Kim, Minsik
Lee, Sanghoon
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 791 - 796
[22] Self-Attention Transducers for End-to-End Speech Recognition
Tian, Zhengkun
Yi, Jiangyan
Tao, Jianhua
Bai, Ye
Wen, Zhengqi
INTERSPEECH 2019, 2019, : 4395 - 4399
[23] EEG-Based Emotion Recognition With Emotion Localization via Hierarchical Self-Attention
Zhang, Yuzhe
Liu, Huan
Zhang, Dalin
Chen, Xuxu
Qin, Tao
Zheng, Qinghua
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 2458 - 2469
[24] IS CROSS-ATTENTION PREFERABLE TO SELF-ATTENTION FOR MULTI-MODAL EMOTION RECOGNITION?
Rajan, Vandana
Brutti, Alessio
Cavallaro, Andrea
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4693 - 4697
[25] Emotion embedding framework with emotional self-attention mechanism for speaker recognition
Li, Dongdong
Yang, Zhuo
Liu, Jinlin
Yang, Hai
Wang, Zhe
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
[26] Spatiotemporal and frequential cascaded attention networks for speech emotion recognition
Li, Shuzhen
Xing, Xiaofen
Fan, Weiquan
Cai, Bolun
Fordson, Perry
Xu, Xiangmin
Neurocomputing, 2021, 448 : 238 - 248
[27] Spatiotemporal and frequential cascaded attention networks for speech emotion recognition
Li, Shuzhen
Xing, Xiaofen
Fan, Weiquan
Cai, Bolun
Fordson, Perry
Xu, Xiangmin
NEUROCOMPUTING, 2021, 448 : 238 - 248
[28] A Fast Convolutional Self-attention Based Speech Dereverberation Method for Robust Speech Recognition
Li, Nan
Ge, Meng
Wang, Longbiao
Dang, Jianwu
NEURAL INFORMATION PROCESSING (ICONIP 2019), PT III, 2019, 11955 : 295 - 305
[29] Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework
Liu, Yang
Sun, Haoqin
Guan, Wenbo
Xia, Yuqi
Zhao, Zhen
SPEECH COMMUNICATION, 2022, 139 : 1 - 9
[30] Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework
Liu, Yang
Sun, Haoqin
Guan, Wenbo
Xia, Yuqi
Zhao, Zhen
Speech Communication, 2022, 139 : 1 - 9

← 1 2 3 4 5 →