Speaker normalisation for speech-based emotion detection

被引：32

作者：

Sethu, Vidhyasaharan ^{[1
,2
]}

Ambikairajah, Eliathainby ^{[1
,2
]}

Epps, Julien ^{[1
,3
]}

机构：

[1] Univ New S Wales, Sch Elect Engn & Telecommun, Sydney, NSW 2052, Australia

[2] NICTA, Sydney, NSW, Australia

[3] UNSW Asia, Singapore 248922, Singapore

来源：

PROCEEDINGS OF THE 2007 15TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING | 2007年

关键词：

feature warping; cumulative distribution mapping; emotion detection; hidden Markov model;

D O I：

10.1109/ICDSP.2007.4288656

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

The focus of this paper is on speech-based emotion detection utilising only acoustic data, i.e. without using any linguistic or semantic information. However, this approach in general Suffers from the fact that acoustic data is speaker-dependent, and can result in inefficient estimation of the statistics modelled by classifiers such as hidden Markov models (HMMs) and Gaussian mixture models (GMMs). We propose the use of speaker-specific feature warping as a means of normalising acoustic features to overcome the problem of speaker dependency. In this paper we compare the performance of a system that uses feature warping to one that does not, The back-end employs ail HMM-based classifier that captures the temporal variations of the feature vectors by modelling them as transitions between different states. Evaluations conducted oil the LDC Emotional Prosody speech corpus reveal a relative increase in classification accuracy of up to 20%.

引用

页码：611 / +

页数：2

共 50 条

[31] A Quick Sequential Forward Floating Feature Selection Algorithm for Emotion Detection from Speech
Brendel, Matyas
Zaccarelli, Riccardo
Devillers, Laurence
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1157 - 1160
[32] Emotion-detecting based model selection for emotional speech recognition
Pan, Y. C.
Xu, M. X.
Liu, L. Q.
Jia, P. F.
2006 IMACS: MULTICONFERENCE ON COMPUTATIONAL ENGINEERING IN SYSTEMS APPLICATIONS, VOLS 1 AND 2, 2006, : 2169 - +
[33] Automatic speech emotion detection using hybrid of gray wolf optimizer and naive Bayes
Ramesh, S.
Gomathi, S.
Sasikala, S.
Saravanan, T. R.
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 26 (3) : 571 - 578
[34] Detection of anger emotion in dialog speech using prosody feature and temporal relation of utterances
Nomoto, Narichika
Masataki, Hirokazu
Yoshioka, Osamu
Takahashi, Satoshi
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 494 - 497
[35] Emotion modeling from speech signal based on wavelet packet transform
Degaonkar, Varsha
Apte, Shaila
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2013, 16 (01) : 1 - 5
[36] A speaker adaptation technique for MRHSMM-based style control of. synthetic speech
Nose, Takashi
Kato, Yoichi
Kobayashi, Takao
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 833 - +
[37] Speaker Adaptation for Slovak Statistical Parametric Speech Synthesis Based on Hidden Markov Models
Sulir, Martin
Juhar, Jozef
2015 25TH INTERNATIONAL CONFERENCE RADIOELEKTRONIKA (RADIOELEKTRONIKA), 2015, : 137 - 140
[38] Speech Emotion Recognition based on Gaussian Mixture Models and Deep Neural Networks
Tashev, Ivan J.
Wang, Zhong-Qiu
Godin, Keith
2017 INFORMATION THEORY AND APPLICATIONS WORKSHOP (ITA), 2017,
[39] Recognition of Human Emotion from a Speech Signal Based on Plutchik's Model
Kaminska, Dorota
Pelikant, Adam
INTERNATIONAL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 2012, 58 (02) : 165 - 170
[40] CNN-Based Models for Emotion and Sentiment Analysis Using Speech Data
Madan, Anjum
Kumar, Devender
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (10)

← 1 2 3 4 5 →