DEEP MULTIMODAL LEARNING FOR EMOTION RECOGNITION IN SPOKEN LANGUAGE

被引：0

作者：

Gu, Yue ^{[1
]}

Chen, Shuhong ^{[1
]}

Marsic, Ivan ^{[1
]}

机构：

[1] Rutgers State Univ, Dept Elect & Comp Engn, Piscataway, NJ 08854 USA

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

关键词：

Emotion recognition; spoken language; deep multimodal learning; SENTIMENT ANALYSIS;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we present a novel deep multimodal framework to predict human emotions based on sentence-level spoken language. Our architecture has two distinctive characteristics. First, it extracts the high-level features from both text and audio via a hybrid deep multimodal structure, which considers the spatial information from text, temporal information from audio, and high-level associations from low-level handcrafted features. Second, we fuse all features by using a three-layer deep neural network to learn the correlations across modalities and train the feature extraction and fusion modules together, allowing optimal global fine-tuning of the entire structure. We evaluated the proposed framework on the IEMOCAP dataset Our result shows promising performance, achieving 60.4% in weighted accuracy for five emotion categories.

引用

页码：5079 / 5083

页数：5

共 20 条

[1] Convolutional Neural Networks for Speech Recognition [J].

Abdel-Hamid, Ossama ;

Mohamed, Abdel-Rahman ;

Jiang, Hui ;

Deng, Li ;

Penn, Gerald ;

Yu, Dong .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) :1533-1545

[2]

[Anonymous], P ICDM BARC

[3]

[Anonymous], ICASSP

[4]

[Anonymous], 2014, THESIS U WATERLOO

[5]

[Anonymous], IEEE T CIRCUITS SYST

[6] IEMOCAP: interactive emotional dyadic motion capture database [J].

Busso, Carlos ;

Bulut, Murtaza ;

Lee, Chi-Chun ;

Kazemzadeh, Abe ;

Mower, Emily ;

Kim, Samuel ;

Chang, Jeannette N. ;

Lee, Sungbok ;

Narayanan, Shrikanth S. .

LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359

[7]

Cai G., 2015, NAT CCF C NAT LANG P

[8]

Eyben F., 2010, P 18 ACM INT C MULT, P1459

[9] Speech Intention Classification with Multimodal Deep Learning [J].

Gu, Yue ;

Li, Xinyu ;

Chen, Shuhong ;

Zhang, Jianyu ;

Marsic, Ivan .

ADVANCES IN ARTIFICIAL INTELLIGENCE, CANADIAN AI 2017, 2017, 10233 :260-271

[10]

Gu Yue, 2017, 2017 IEEE INT C HEAL

← 1 2 →