Speaker normalisation for speech-based emotion detection

被引:32
|
作者
Sethu, Vidhyasaharan [1 ,2 ]
Ambikairajah, Eliathainby [1 ,2 ]
Epps, Julien [1 ,3 ]
机构
[1] Univ New S Wales, Sch Elect Engn & Telecommun, Sydney, NSW 2052, Australia
[2] NICTA, Sydney, NSW, Australia
[3] UNSW Asia, Singapore 248922, Singapore
来源
PROCEEDINGS OF THE 2007 15TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING | 2007年
关键词
feature warping; cumulative distribution mapping; emotion detection; hidden Markov model;
D O I
10.1109/ICDSP.2007.4288656
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The focus of this paper is on speech-based emotion detection utilising only acoustic data, i.e. without using any linguistic or semantic information. However, this approach in general Suffers from the fact that acoustic data is speaker-dependent, and can result in inefficient estimation of the statistics modelled by classifiers such as hidden Markov models (HMMs) and Gaussian mixture models (GMMs). We propose the use of speaker-specific feature warping as a means of normalising acoustic features to overcome the problem of speaker dependency. In this paper we compare the performance of a system that uses feature warping to one that does not, The back-end employs ail HMM-based classifier that captures the temporal variations of the feature vectors by modelling them as transitions between different states. Evaluations conducted oil the LDC Emotional Prosody speech corpus reveal a relative increase in classification accuracy of up to 20%.
引用
收藏
页码:611 / +
页数:2
相关论文
共 50 条
  • [21] Real Time Emotion Detection From Speech Using Raspberry Pi 3
    Mishra, Amit
    Patil, Dipak
    Karkhanis, Nikhil
    Gaikar, Vaishnavi
    Wani, Kadambari
    2017 2ND IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2017, : 2300 - 2303
  • [22] CBE : Corpus-Based of Emotion for Emotion Detection in Text Document
    Rachman, Fika Hastarita
    Sarno, Riyanarto
    Fatichah, Chastine
    2016 3RD INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY, COMPUTER, AND ELECTRICAL ENGINEERING (ICITACEE), 2016, : 331 - 335
  • [23] ShEMO: a large-scale validated database for Persian speech emotion detection
    Nezami, Omid Mohamad
    Lou, Paria Jamshid
    Karami, Mansoureh
    LANGUAGE RESOURCES AND EVALUATION, 2019, 53 (01) : 1 - 16
  • [24] ShEMO: a large-scale validated database for Persian speech emotion detection
    Omid Mohamad Nezami
    Paria Jamshid Lou
    Mansoureh Karami
    Language Resources and Evaluation, 2019, 53 : 1 - 16
  • [25] Emotion Detection for Social Robots Based on NLP Transformers and an Emotion Ontology
    Graterol, Wilfredo
    Diaz-Amado, Jose
    Cardinale, Yudith
    Dongo, Irvin
    Lopes-Silva, Edmundo
    Santos-Libarino, Cleia
    SENSORS, 2021, 21 (04) : 1 - 19
  • [26] A SPEAKER ADAPTABLE VERY LOW BIT RATE SPEECH CODER BASED ON HMM
    彭煳
    朱杰
    JournalofShanghaiJiaotongUniversity, 2000, (02) : 1 - 5
  • [27] Ontology-Based Textual Emotion Detection
    Haggag, Mohamed
    Fathy, Samar
    Elhaggar, Nahla
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2015, 6 (09) : 239 - 246
  • [28] SPEAKER-CONSISTENT PARSING FOR SPEAKER-INDEPENDENT CONTINUOUS SPEECH RECOGNITION
    YAMAGUCHI, K
    SINGER, H
    MATSUNAGA, S
    SAGAYAMA, S
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1995, E78D (06) : 719 - 724
  • [29] HMM Based Emotion Detection in Games: An Apercu
    Mishra, Prerna
    Ratnaparkhi, Saurabh
    2018 3RD INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2018,
  • [30] Emotional transplant in statistical speech synthesis based on emotion additive model
    Ohtani, Yaniato
    Nasu, Yu
    Morita, Masahiro
    Akamine, Masami
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 274 - 278