USING REGIONAL SALIENCY FOR SPEECH EMOTION RECOGNITION

被引：0

作者：

Aldeneh, Zakaria ^{[1
]}

Provost, Emily Mower ^{[1
]}

机构：

[1] Univ Michigan, Comp Sci & Engn, Ann Arbor, MI 48109 USA

来源：

2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2017年

关键词：

speech emotion recognition; convolutional neural network; machine learning;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we show that convolutional neural networks can be directly applied to temporal low-level acoustic features to identify emotionally salient regions without the need for defining or applying utterance-level statistics. We show how a convolutional neural network can be applied to minimally hand-engineered features to obtain competitive results on the IEMOCAP and MSP-IMPROV datasets. In addition, we demonstrate that, despite their common use across most categories of acoustic features, utterance-level statistics may obfuscate emotional information. Our results suggest that convolutional neural networks with Mel Filterbanks (MFBs) can be used as a replacement for classifiers that rely on features obtained from applying utterance-level statistics.

引用

页码：2741 / 2745

页数：5