Speech emotion recognition based on transfer learning from the FaceNet frameworka)

被引:21
|
作者
Liu, Shuhua [1 ]
Zhang, Mengyu [1 ]
Fang, Ming [1 ]
Zhao, Jianwei [1 ]
Hou, Kun [1 ]
Hung, Chih-Cheng [2 ]
机构
[1] Northeast Normal Univ, Changchun 130117, Jilin, Peoples R China
[2] Kennesaw State Univ, Coll Comp & Software Engn, Marietta, GA 30060 USA
来源
关键词
D O I
10.1121/10.0003530
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech plays an important role in human-computer emotional interaction. FaceNet used in face recognition achieves great success due to its excellent feature extraction. In this study, we adopt the FaceNet model and improve it for speech emotion recognition. To apply this model for our work, speech signals are divided into segments at a given time interval, and the signal segments are transformed into a discrete waveform diagram and spectrogram. Subsequently, the waveform and spectrogram are separately fed into FaceNet for end-to-end training. Our empirical study shows that the pretraining is effective on the spectrogram for FaceNet. Hence, we pretrain the network on the CASIA dataset and then fine-tune it on the IEMOCAP dataset with waveforms. It will derive the maximum transfer learning knowledge from the CASIA dataset due to its high accuracy. This high accuracy may be due to its clean signals. Our preliminary experimental results show an accuracy of 68.96% and 90% on the emotion benchmark datasets IEMOCAP and CASIA, respectively. The cross-training is then conducted on the dataset, and comprehensive experiments are performed. Experimental results indicate that the proposed approach outperforms state-of-the-art methods on the IEMOCAP dataset among single modal approaches.
引用
收藏
页码:1338 / 1345
页数:8
相关论文
共 50 条
  • [21] Transfer Learning Based Facial Emotion Recognition
    M. S. Lavanya
    Vanishri Arun
    Mayura Tapkire
    K. P. Suhaas
    SN Computer Science, 6 (1)
  • [22] Learning Alignment for Multimodal Emotion Recognition from Speech
    Xu, Haiyang
    Zhang, Hui
    Han, Kun
    Wang, Yun
    Peng, Yiping
    Li, Xiangang
    INTERSPEECH 2019, 2019, : 3569 - 3573
  • [23] Emotion Recognition from Speech: An Unsupervised Learning Approach
    Rovetta, Stefano
    Mnasri, Zied
    Masulli, Francesco
    Cabri, Alberto
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2021, 14 (01) : 23 - 35
  • [24] Exploring Transfer Learning between Scripted and Spontaneous Speech for Emotion Recognition
    Li, Qingqing
    Chaspari, Theodora
    ICMI'19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2019, : 435 - 439
  • [25] A web crowdsourcing framework for transfer learning and personalized Speech Emotion Recognition
    Vryzas, Nikolaos
    Vrysis, Lazaros
    Kotsakis, Rigas
    Dimoulas, Charalampos
    MACHINE LEARNING WITH APPLICATIONS, 2021, 6
  • [26] Cross-Corpus Speech Emotion Recognition Based on Joint Transfer Subspace Learning and Regression
    Zhang, Weijian
    Song, Peng
    Chen, Dongliang
    Sheng, Chao
    Zhang, Wenjing
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2022, 14 (02) : 588 - 598
  • [27] Transfer learning from speech to music: towards language-sensitive emotion recognition models
    Gomez Canon, Juan Sebastian
    Cano, Estefania
    Herrera, Perfecto
    Gomez, Emilia
    28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 136 - 140
  • [28] Representation Learning for Speech Emotion Recognition
    Ghosh, Sayan
    Laksana, Eugene
    Morency, Louis-Philippe
    Scherer, Stefan
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3603 - 3607
  • [29] Transformer-based transfer learning and multi-task learning for improving the performance of speech emotion recognition
    Park, Sunchan
    Kim, Hyung Soon
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2021, 40 (05): : 515 - 522
  • [30] Speech Emotion Recognition with Deep Learning
    Harar, Pavol
    Burget, Radim
    Dutta, Malay Kishore
    2017 4TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2017, : 137 - 140