Speech emotion recognition based on transfer learning from the FaceNet frameworka)

被引:21
|
作者
Liu, Shuhua [1 ]
Zhang, Mengyu [1 ]
Fang, Ming [1 ]
Zhao, Jianwei [1 ]
Hou, Kun [1 ]
Hung, Chih-Cheng [2 ]
机构
[1] Northeast Normal Univ, Changchun 130117, Jilin, Peoples R China
[2] Kennesaw State Univ, Coll Comp & Software Engn, Marietta, GA 30060 USA
来源
关键词
D O I
10.1121/10.0003530
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech plays an important role in human-computer emotional interaction. FaceNet used in face recognition achieves great success due to its excellent feature extraction. In this study, we adopt the FaceNet model and improve it for speech emotion recognition. To apply this model for our work, speech signals are divided into segments at a given time interval, and the signal segments are transformed into a discrete waveform diagram and spectrogram. Subsequently, the waveform and spectrogram are separately fed into FaceNet for end-to-end training. Our empirical study shows that the pretraining is effective on the spectrogram for FaceNet. Hence, we pretrain the network on the CASIA dataset and then fine-tune it on the IEMOCAP dataset with waveforms. It will derive the maximum transfer learning knowledge from the CASIA dataset due to its high accuracy. This high accuracy may be due to its clean signals. Our preliminary experimental results show an accuracy of 68.96% and 90% on the emotion benchmark datasets IEMOCAP and CASIA, respectively. The cross-training is then conducted on the dataset, and comprehensive experiments are performed. Experimental results indicate that the proposed approach outperforms state-of-the-art methods on the IEMOCAP dataset among single modal approaches.
引用
收藏
页码:1338 / 1345
页数:8
相关论文
共 50 条
  • [31] Deep learning based Affective Model for Speech Emotion Recognition
    Zhou, Xi
    Guo, Junqi
    Bie, Rongfang
    2016 INT IEEE CONFERENCES ON UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, CLOUD AND BIG DATA COMPUTING, INTERNET OF PEOPLE, AND SMART WORLD CONGRESS (UIC/ATC/SCALCOM/CBDCOM/IOP/SMARTWORLD), 2016, : 841 - 846
  • [32] Graph Learning Based Speaker Independent Speech Emotion Recognition
    Xu, Xinzhou
    Huang, Chengwei
    Wu, Chen
    Wang, Qingyun
    Zhao, Li
    ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2014, 14 (02) : 17 - 22
  • [33] PREDICTION-BASED LEARNING FOR CONTINUOUS EMOTION RECOGNITION IN SPEECH
    Han, Jing
    Zhang, Zixing
    Ringeval, Fabien
    Schuller, Bjorn
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5005 - 5009
  • [34] Deep Learning Approach towards Emotion Recognition Based on Speech
    Butala, Padmanabh
    Pawar, Rajendra
    Jadhav, Nagesh
    Kalangan, Manas
    Dhumal, Aniket
    Kakad, Sahil
    JOURNAL OF ADVANCED APPLIED SCIENTIFIC RESEARCH, 2024, 6 (03): : 16 - 24
  • [35] Unsupervised Feature Learning for Speech Emotion Recognition Based on Autoencoder
    Ying, Yangwei
    Tu, Yuanwu
    Zhou, Hong
    ELECTRONICS, 2021, 10 (17)
  • [36] Feature Fusion of Speech Emotion Recognition Based on Deep Learning
    Liu, Gang
    He, Wei
    Jin, Bicheng
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC), 2018, : 193 - 197
  • [37] Speech Emotion Recognition based on Multi-Task Learning
    Zhao, Huijuan
    Han Zhijie
    Wang, Ruchuan
    2019 IEEE 5TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY) / IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING (HPSC) / IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2019, : 186 - 188
  • [38] AESR: Speech Recognition With Speech Emotion Recogniting Learning
    Han, RongQi
    Liu, Xin
    Zhang, Hui
    MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2024, 2025, 2312 : 91 - 101
  • [39] CycleGAN-based Emotion Style Transfer as Data Augmentation for Speech Emotion Recognition
    Bao, Fang
    Neumann, Michael
    Ngoc Thang Vu
    INTERSPEECH 2019, 2019, : 2828 - 2832
  • [40] Curriculum Learning for Speech Emotion Recognition From Crowdsourced Labels
    Lotfian, Reza
    Busso, Carlos
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (04) : 815 - 826