Deep neural networks for emotion recognition combining audio and transcripts

被引:53
|
作者
Cho, Jaejin [1 ]
Pappagari, Raghavendra [1 ]
Kulkarni, Purva [2 ]
Villalba, Jesus [1 ]
Carmiel, Yishay [2 ]
Dehak, Najim [1 ]
机构
[1] Johns Hopkins Univ, Ctr Language Speech Proc, Baltimore, MD 21218 USA
[2] IntelligentWire, Seattle, WA USA
关键词
emotion recognition; deep neural networks; automatic speech recognition;
D O I
10.21437/Interspeech.2018-2466
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose to improve emotion recognition by combining acoustic information and conversation transcripts. On the one hand, a LSTM network was used to detect emotion from acoustic features like f0, shimmer, jitter, MFCC, etc. On the other hand, a multi-resolution CNN was used to detect emotion from word sequences. This CNN consists of several parallel convolutions with different kernel sizes to exploit contextual information at different levels. A temporal pooling layer aggregates the hidden representations of different words into a unique sequence level embedding, from which we computed the emotion posteriors. We optimized a weighted sum of classification and verification losses. The verification loss tries to bring embeddings from same emotions closer while separating embeddings from different emotions. We also compared our CNN with state-of-the-art text-based hand-crafted features (e-vector). We evaluated our approach on the USC-IEMOCAP dataset as well as the dataset consisting of US English telephone speech. In the former, we used human-annotated transcripts while in the latter, we used ASR transcripts. The results showed fusing audio and transcript information improved unweighted accuracy by relative 24% for IEMOCAP and relative 3.4% for the telephone data compared to a single acoustic system.
引用
收藏
页码:247 / 251
页数:5
相关论文
共 50 条
  • [1] Combining Modality Specific Deep Neural Networks for Emotion Recognition in Video
    Kahou, Samira Ebrahimi
    Pal, Christopher
    Bouthillier, Xavier
    Froumenty, Pierre
    Gulcehre, Caglar
    Memisevic, Roland
    Vincent, Pascal
    Courville, Aaron
    Bengio, Yoshua
    ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2013, : 543 - 550
  • [2] DEEP NEURAL NETWORKS FOR AUDIO SCENE RECOGNITION
    Petetin, Yohan
    Laroche, Cyrille
    Mayoue, Aurelien
    2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 125 - 129
  • [3] Audio and Face Video Emotion Recognition in the Wild using Deep Neural Networks and Small Datasets
    Ding, Wan
    Xu, Mingyu
    Huang, Dongyan
    Lin, Weisi
    Dong, Minghui
    Yu, Xinguo
    Li, Haizhou
    ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, : 506 - 513
  • [4] Emotion Recognition Using Pretrained Deep Neural Networks
    Dobes, Marek
    Sabolova, Natalia
    ACTA POLYTECHNICA HUNGARICA, 2023, 20 (04) : 195 - 204
  • [5] Speech emotion recognition with deep convolutional neural networks
    Issa, Dias
    Demirci, M. Fatih
    Yazici, Adnan
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 59
  • [6] Visual Emotion Recognition Using Deep Neural Networks
    Iliev, Alexander I.
    Mote, Ameya
    DIGITAL PRESENTATION AND PRESERVATION OF CULTURAL AND SCIENTIFIC HERITAGE, 2022, 12 : 77 - 88
  • [7] Hybrid deep neural networks for face emotion recognition
    Jain, Neha
    Kumar, Shishir
    Kumar, Amit
    Shamsolmoali, Pourya
    Zareapoor, Masoumeh
    PATTERN RECOGNITION LETTERS, 2018, 115 : 101 - 106
  • [8] Multimodal Emotion Recognition Using Deep Neural Networks
    Tang, Hao
    Liu, Wei
    Zheng, Wei-Long
    Lu, Bao-Liang
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 811 - 819
  • [9] Physiological Inspired Deep Neural Networks for Emotion Recognition
    Ferreira, Pedro M.
    Marques, Filipe
    Cardoso, Jaime S.
    Rebelo, Ana
    IEEE ACCESS, 2018, 6 : 53930 - 53943
  • [10] Audio-Textual Emotion Recognition Based on Improved Neural Networks
    Cai, Linqin
    Hu, Yaxin
    Dong, Jiangong
    Zhou, Sitong
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2019, 2019