Segment Repetition Based on High Amplitude to Enhance a Speech Emotion Recognition

被引:2
|
作者
Prayitno, Bagas Adi [1 ]
Suyanto, Suyanto [1 ]
机构
[1] Telkom Univ, Sch Comp, Jl Telekomunikasi 01 Terusan Buah Batu, Bandung 40257, West Java, Indonesia
来源
4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE (ICCSCI 2019) : ENABLING COLLABORATION TO ESCALATE IMPACT OF RESEARCH RESULTS FOR SOCIETY | 2019年 / 157卷
关键词
data augmentation; high amplitude; long short-term memory; segment repetition; speech emotion recognition;
D O I
10.1016/j.procs.2019.08.234
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech Emotion Recognition (SER) is a technology developed on a computer to realize a Human-Computer Interaction (HCI). It is a challenging task since the lack of data. Some data augmentation methods have been created to increase the data variation, but they do not significantly improve accuracy. Therefore, a new additional data augmentation method called Segment Repetition based on High Amplitude (SRHA) is proposed to solve this problem. This method makes some repetitions on the segments that have the highest amplitude. An experiment of 10 times data augmentation, using five standard augmentations and the additional SRHA with a Long Short-Term Memory (LSTM) as the classifier, shows that the proposed SRHA significantly increases the SER accuracy from 95.88% to 98.16%. Other experiments for 20 and 40 times data augmentations also show that the SRHA outperforms the five standard augmentations. These indicate that the SRHA is a powerful data augmentation method for SER. (C) 2019 The Authors. Published by Elsevier B.V.
引用
收藏
页码:420 / 426
页数:7
相关论文
共 50 条
  • [1] Speech Emotion Recognition Based on Acoustic Segment Model
    Zheng, Siyuan
    Du, Jun
    Zhou, Hengshun
    Bai, Xue
    Lee, Chin-Hui
    Li, Shipeng
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [2] Speech emotion recognition based on prosodic segment level features
    Han, Wenjing
    Li, Haifeng
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2009, 49 (SUPPL. 1): : 1363 - 1368
  • [3] Timing Levels in Segment-Based Speech Emotion Recognition
    Schuller, Bjoern
    Rigoll, Gerhard
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1818 - 1821
  • [4] Speech Emotion Recognition Based on Speech Segment Using LSTM with Attention Model
    Atmaja, Bagus Tris
    Akagi, Masato
    2019 IEEE INTERNATIONAL CONFERENCE ON SIGNALS AND SYSTEMS (ICSIGSYS), 2019, : 40 - 44
  • [5] Amplitude Modulation Features for Emotion Recognition from Speech
    Alam, Md Jahangir
    Attabi, Yazid
    Dumouchel, Pierre
    Kenny, Patrick
    O'Shaughnessy, D.
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2419 - 2423
  • [6] Segment-based emotion recognition from continuous Mandarin Chinese speech
    Yeh, Jun-Heng
    Pao, Tsang-Long
    Lin, Ching-Yi
    Tsai, Yao-Wei
    Chen, Yu-Te
    COMPUTERS IN HUMAN BEHAVIOR, 2011, 27 (05) : 1545 - 1552
  • [7] Segment-Based Speech Emotion Recognition Using Recurrent Neural Networks
    Tzinis, Efthymios
    Potamianos, Alexandros
    2017 SEVENTH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2017, : 190 - 195
  • [8] Multistage classification scheme to enhance speech emotion recognition
    S. S. Poorna
    G. J. Nair
    International Journal of Speech Technology, 2019, 22 : 327 - 340
  • [9] Multistage classification scheme to enhance speech emotion recognition
    Poorna, S. S.
    Nair, G. J.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (02) : 327 - 340
  • [10] Speech emotion recognition based on emotion perception
    Gang Liu
    Shifang Cai
    Ce Wang
    EURASIP Journal on Audio, Speech, and Music Processing, 2023