A Neural Network Architecture for Children's Audio-Visual Emotion Recognition

被引:1
|
作者
Matveev, Anton [1 ]
Matveev, Yuri [1 ]
Frolova, Olga [1 ]
Nikolaev, Aleksandr [1 ]
Lyakso, Elena [1 ]
机构
[1] St Petersburg Univ, Dept Higher Nervous Act & Psychophysiol, Child Speech Res Grp, St Petersburg 199034, Russia
基金
俄罗斯科学基金会;
关键词
audio-visual speech; emotion recognition; children; MULTIMODAL FUSION; SPEECH; AGE;
D O I
10.3390/math11224573
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Detecting and understanding emotions are critical for our daily activities. As emotion recognition (ER) systems develop, we start looking at more difficult cases than just acted adult audio-visual speech. In this work, we investigate the automatic classification of the audio-visual emotional speech of children, which presents several challenges including the lack of publicly available annotated datasets and the low performance of the state-of-the art audio-visual ER systems. In this paper, we present a new corpus of children's audio-visual emotional speech that we collected. Then, we propose a neural network solution that improves the utilization of the temporal relationships between audio and video modalities in the cross-modal fusion for children's audio-visual emotion recognition. We select a state-of-the-art neural network architecture as a baseline and present several modifications focused on a deeper learning of the cross-modal temporal relationships using attention. By conducting experiments with our proposed approach and the selected baseline model, we observe a relative improvement in performance by 2%. Finally, we conclude that focusing more on the cross-modal temporal relationships may be beneficial for building ER systems for child-machine communications and environments where qualified professionals work with children.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Metric Learning-Based Multimodal Audio-Visual Emotion Recognition
    Ghaleb, Esam
    Popa, Mirela
    Asteriadis, Stylianos
    IEEE MULTIMEDIA, 2020, 27 (01) : 37 - 48
  • [42] Audio-Visual Emotion Recognition Based on a DBN Model with Constrained Asynchrony
    Chen, Danqi
    Jiang, Dongmei
    Ravyse, Ilse
    Sahli, Hichem
    PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON IMAGE AND GRAPHICS (ICIG 2009), 2009, : 912 - 916
  • [43] Leveraging recent advances in deep learning for audio-Visual emotion recognition
    Schoneveld, Liam
    Othmani, Alice
    Abdelkawy, Hazem
    PATTERN RECOGNITION LETTERS, 2021, 146 : 1 - 7
  • [44] Learning Better Representations for Audio-Visual Emotion Recognition with Common Information
    Ma, Fei
    Zhang, Wei
    Li, Yang
    Huang, Shao-Lun
    Zhang, Lin
    APPLIED SCIENCES-BASEL, 2020, 10 (20): : 1 - 23
  • [45] Audio-visual affect recognition
    Zeng, Zhihong
    Tu, Jilin
    Liu, Ming
    Huang, Thomas S.
    Pianfetti, Brian
    Roth, Dan
    Levinson, Stephen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2007, 9 (02) : 424 - 428
  • [46] Audio-visual integration of emotion expression
    Collignon, Olivier
    Girard, Simon
    Gosselin, Frederic
    Roy, Sylvain
    Saint-Amour, Dave
    Lassonde, Maryse
    Lepore, Franco
    BRAIN RESEARCH, 2008, 1242 : 126 - 135
  • [47] Audio-visual gender recognition
    Liu, Ming
    Xu, Xun
    Huang, Thomas S.
    MIPPR 2007: PATTERN RECOGNITION AND COMPUTER VISION, 2007, 6788
  • [48] AUDIO-VISUAL FUSION AND CONDITIONING WITH NEURAL NETWORKS FOR EVENT RECOGNITION
    Brousmiche, Mathilde
    Rouat, Jean
    Dupont, Stephane
    2019 IEEE 29TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2019,
  • [49] DAVIS: Driver's Audio-Visual Speech Recognition
    Ivanko, Denis
    Ryumin, Dmitry
    Kashevnik, Alexey
    Axyonov, Alexandr
    Kitenko, Andrey
    Lashkov, Igor
    Karpov, Alexey
    INTERSPEECH 2022, 2022, : 1141 - 1142
  • [50] An audio-visual speech recognition system for testing new audio-visual databases
    Pao, Tsang-Long
    Liao, Wen-Yuan
    VISAPP 2006: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2, 2006, : 192 - +