Speech emotion recognition based on transfer learning from the FaceNet frameworka)

被引:21
|
作者
Liu, Shuhua [1 ]
Zhang, Mengyu [1 ]
Fang, Ming [1 ]
Zhao, Jianwei [1 ]
Hou, Kun [1 ]
Hung, Chih-Cheng [2 ]
机构
[1] Northeast Normal Univ, Changchun 130117, Jilin, Peoples R China
[2] Kennesaw State Univ, Coll Comp & Software Engn, Marietta, GA 30060 USA
来源
关键词
D O I
10.1121/10.0003530
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech plays an important role in human-computer emotional interaction. FaceNet used in face recognition achieves great success due to its excellent feature extraction. In this study, we adopt the FaceNet model and improve it for speech emotion recognition. To apply this model for our work, speech signals are divided into segments at a given time interval, and the signal segments are transformed into a discrete waveform diagram and spectrogram. Subsequently, the waveform and spectrogram are separately fed into FaceNet for end-to-end training. Our empirical study shows that the pretraining is effective on the spectrogram for FaceNet. Hence, we pretrain the network on the CASIA dataset and then fine-tune it on the IEMOCAP dataset with waveforms. It will derive the maximum transfer learning knowledge from the CASIA dataset due to its high accuracy. This high accuracy may be due to its clean signals. Our preliminary experimental results show an accuracy of 68.96% and 90% on the emotion benchmark datasets IEMOCAP and CASIA, respectively. The cross-training is then conducted on the dataset, and comprehensive experiments are performed. Experimental results indicate that the proposed approach outperforms state-of-the-art methods on the IEMOCAP dataset among single modal approaches.
引用
收藏
页码:1338 / 1345
页数:8
相关论文
共 50 条
  • [1] Transfer Learning for Speech Emotion Recognition
    Han Zhijie
    Zhao, Huijuan
    Wang, Ruchuan
    2019 IEEE 5TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY) / IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING (HPSC) / IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2019, : 96 - 99
  • [2] Speech Emotion Recognition Based on Sparse Transfer Learning Method
    Song, Peng
    Zheng, Wenming
    Liang, Ruiyu
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2015, E98D (07) : 1409 - 1412
  • [3] Speech Emotion Recognition Using Transfer Learning
    Song, Peng
    Jin, Yun
    Zhao, Li
    Xin, Minghai
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (09): : 2530 - 2532
  • [4] Speech Emotion Recognition Based on Transfer Emotion-Discriminative Features Subspace Learning
    Zhang, Kexin
    Liu, Yunxiang
    IEEE ACCESS, 2023, 11 : 56336 - 56343
  • [5] Feature Selection Based Transfer Subspace Learning for Speech Emotion Recognition
    Song, Peng
    Zheng, Wenming
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2020, 11 (03) : 373 - 382
  • [6] Deep Learning Based Emotion Recognition from Chinese Speech
    Zhang, Weishan
    Zhao, Dehai
    Chen, Xiufeng
    Zhang, Yuanjie
    INCLUSIVE SMART CITIES AND DIGITAL HEALTH, 2016, 9677 : 49 - 58
  • [7] Transfer Learning of Large Speech Models for Italian Speech Emotion Recognition
    D'Asaro, Federico
    Villacis, Juan Jose Marquez
    Rizzo, Giuseppe
    2024 IEEE 18TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES, AICT 2024, 2024,
  • [8] Speech emotion recognition based on meta-transfer learning with domain adaption
    Liu, Zhen -Tao
    Wu, Bao-Han
    Han, Meng -Ting
    Cao, Wei -Hua
    Wu, Min
    APPLIED SOFT COMPUTING, 2023, 147
  • [9] Sparse Autoencoder-based Feature Transfer Learning for Speech Emotion Recognition
    Deng, Jun
    Zhang, Zixing
    Marchi, Erik
    Schuller, Bjoern
    2013 HUMAINE ASSOCIATION CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2013, : 511 - 516
  • [10] Speech Emotion Recognition Based on Learning Automata in
    Motamed, Sara
    Setayeshi, Saeed
    Farhoudi, Zeinab
    Ahmadi, Ali
    JOURNAL OF MATHEMATICS AND COMPUTER SCIENCE-JMCS, 2014, 12 (03): : 173 - 185