Segment Repetition Based on High Amplitude to Enhance a Speech Emotion Recognition

被引：2

作者：

Prayitno, Bagas Adi ^{[1
]}

Suyanto, Suyanto ^{[1
]}

机构：

[1] Telkom Univ, Sch Comp, Jl Telekomunikasi 01 Terusan Buah Batu, Bandung 40257, West Java, Indonesia

来源：

4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE (ICCSCI 2019) : ENABLING COLLABORATION TO ESCALATE IMPACT OF RESEARCH RESULTS FOR SOCIETY | 2019年 / 157卷

关键词：

data augmentation; high amplitude; long short-term memory; segment repetition; speech emotion recognition;

D O I：

10.1016/j.procs.2019.08.234

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech Emotion Recognition (SER) is a technology developed on a computer to realize a Human-Computer Interaction (HCI). It is a challenging task since the lack of data. Some data augmentation methods have been created to increase the data variation, but they do not significantly improve accuracy. Therefore, a new additional data augmentation method called Segment Repetition based on High Amplitude (SRHA) is proposed to solve this problem. This method makes some repetitions on the segments that have the highest amplitude. An experiment of 10 times data augmentation, using five standard augmentations and the additional SRHA with a Long Short-Term Memory (LSTM) as the classifier, shows that the proposed SRHA significantly increases the SER accuracy from 95.88% to 98.16%. Other experiments for 20 and 40 times data augmentations also show that the SRHA outperforms the five standard augmentations. These indicate that the SRHA is a powerful data augmentation method for SER. (C) 2019 The Authors. Published by Elsevier B.V.

引用

页码：420 / 426

页数：7

共 50 条

[1] Speech Emotion Recognition Based on Acoustic Segment Model
Zheng, Siyuan
Du, Jun
Zhou, Hengshun
Bai, Xue
Lee, Chin-Hui
Li, Shipeng
2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
[2] Speech emotion recognition based on prosodic segment level features
Han, Wenjing
Li, Haifeng
Qinghua Daxue Xuebao/Journal of Tsinghua University, 2009, 49 (SUPPL. 1): : 1363 - 1368
[3] Timing Levels in Segment-Based Speech Emotion Recognition
Schuller, Bjoern
Rigoll, Gerhard
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1818 - 1821
[4] Speech Emotion Recognition Based on Speech Segment Using LSTM with Attention Model
Atmaja, Bagus Tris
Akagi, Masato
2019 IEEE INTERNATIONAL CONFERENCE ON SIGNALS AND SYSTEMS (ICSIGSYS), 2019, : 40 - 44
[5] Amplitude Modulation Features for Emotion Recognition from Speech
Alam, Md Jahangir
Attabi, Yazid
Dumouchel, Pierre
Kenny, Patrick
O'Shaughnessy, D.
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2419 - 2423
[6] Segment-based emotion recognition from continuous Mandarin Chinese speech
Yeh, Jun-Heng
Pao, Tsang-Long
Lin, Ching-Yi
Tsai, Yao-Wei
Chen, Yu-Te
COMPUTERS IN HUMAN BEHAVIOR, 2011, 27 (05) : 1545 - 1552
[7] Segment-Based Speech Emotion Recognition Using Recurrent Neural Networks
Tzinis, Efthymios
Potamianos, Alexandros
2017 SEVENTH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2017, : 190 - 195
[8] Multistage classification scheme to enhance speech emotion recognition
S. S. Poorna
G. J. Nair
International Journal of Speech Technology, 2019, 22 : 327 - 340
[9] Multistage classification scheme to enhance speech emotion recognition
Poorna, S. S.
Nair, G. J.
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (02) : 327 - 340
[10] Speech emotion recognition based on emotion perception
Gang Liu
Shifang Cai
Ce Wang
EURASIP Journal on Audio, Speech, and Music Processing, 2023

← 1 2 3 4 5 →