USING REGIONAL SALIENCY FOR SPEECH EMOTION RECOGNITION

被引:0
作者
Aldeneh, Zakaria [1 ]
Provost, Emily Mower [1 ]
机构
[1] Univ Michigan, Comp Sci & Engn, Ann Arbor, MI 48109 USA
来源
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2017年
关键词
speech emotion recognition; convolutional neural network; machine learning;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we show that convolutional neural networks can be directly applied to temporal low-level acoustic features to identify emotionally salient regions without the need for defining or applying utterance-level statistics. We show how a convolutional neural network can be applied to minimally hand-engineered features to obtain competitive results on the IEMOCAP and MSP-IMPROV datasets. In addition, we demonstrate that, despite their common use across most categories of acoustic features, utterance-level statistics may obfuscate emotional information. Our results suggest that convolutional neural networks with Mel Filterbanks (MFBs) can be used as a replacement for classifiers that rely on features obtained from applying utterance-level statistics.
引用
收藏
页码:2741 / 2745
页数:5
相关论文
共 30 条
[1]  
[Anonymous], 2014, C EMP METH NAT LANG
[2]  
[Anonymous], 2015, P INT
[3]  
[Anonymous], P INT
[4]  
[Anonymous], ICASSP
[5]  
[Anonymous], IEEE T AUDIO SPEECH
[6]  
[Anonymous], IEEE T AFFECTIVE COM
[7]  
[Anonymous], 2009, P INT
[8]  
[Anonymous], ACM INT C MULT INT
[9]  
[Anonymous], IEEE T AFFECTIVE COM
[10]  
[Anonymous], ICASSP