Model Smoothing using Virtual Adversarial Training for Speech Emotion Estimation using Spontaneity

被引:0
作者
Kuwahara, Toyoaki [1 ]
Orihara, Ryohei [1 ]
Sei, Yuichi [1 ]
Tahara, Yasuyuki [1 ]
Ohsuga, Akihiko [1 ]
机构
[1] Univ Electrocommun, Grad Sch Informat & Engn, Tokyo, Japan
来源
ICAART: PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 2 | 2020年
关键词
Deep Learning; Cross Corpus; Virtual Adversarial Training; Emotion Recognition; Speech Processing; Spontaneity; DEEP NEURAL-NETWORK; PERCEPTION;
D O I
10.5220/0008958405700577
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech-based emotion estimation increases accuracy through the development of deep learning. However, most emotion estimation using deep learning requires supervised learning, and it is difficult to obtain large datasets used for training. In addition, if the training data environment and the actual data environment are significantly different, the problem is that the accuracy of emotion estimation is reduced. Therefore, in this study, to solve these problems, we propose a emotion estimation model using virtual adversarial training (VAT), a semi-supervised learning method that improves the robustness of the model. Furthermore, research on the spontaneity of speech has progressed year by year, and recent studies have shown that the accuracy of emotion classification is improved when spontaneity is taken into account. We would like to investigate the effect of the spontaneity in a cross-language situation. First, VAT hyperparameters were first set by a preliminary experiment using a single corpus. Next, the robustness of the model generated by the evaluation experiment by the cross corpus was shown. Finally, we evaluate the accuracy of emotion estimation by considering spontaneity and showed improvement in the accuracy of the model using VAT by considering spontaneity.
引用
收藏
页码:570 / 577
页数:8
相关论文
共 22 条
  • [1] [Anonymous], 2015, ARXIV PREPRINT ARXIV
  • [2] [Anonymous], 2009, 10 ANN C INT SPEECH
  • [3] IEMOCAP: interactive emotional dyadic motion capture database
    Busso, Carlos
    Bulut, Murtaza
    Lee, Chi-Chun
    Kazemzadeh, Abe
    Mower, Emily
    Kim, Samuel
    Chang, Jeannette N.
    Lee, Sungbok
    Narayanan, Shrikanth S.
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) : 335 - 359
  • [4] Dufour R., 2009, SPECOM
  • [5] Characterizing and detecting spontaneous speech: Application to speaker role recognition
    Dufour, Richard
    Esteve, Yannick
    Deleglise, Paul
    [J]. SPEECH COMMUNICATION, 2014, 56 : 1 - 18
  • [6] Englebienne Gwenn, 2017, ARXIV170803920
  • [7] Eyben F., 2010, P 18 ACM INT C MULT, P1459
  • [8] Goodfellow I. J., 2015, CORR
  • [9] Han K, 2014, INTERSPEECH, P223
  • [10] Cultural dependency analysis for understanding speech emotion
    Kamaruddin, Norhaslinda
    Wahab, Abdul
    Quek, Chai
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (05) : 5115 - 5133