Enhancement of emotion detection in spoken dialogue systems by combining several information sources

被引:13
|
作者
Lopez-Cozar, Ramon [1 ]
Silovsky, Jan [2 ]
Kroul, Martin [2 ]
机构
[1] Univ Granada, Dept Languages & Comp Syst, Fac Comp Sci, E-18071 Granada, Spain
[2] Tech Univ Liberec, Inst Informat Technol & Elect, Fac Mechatron, Liberec, Czech Republic
关键词
Adaptive spoken dialogue systems; Combination of classifiers; Information fusion; Emotion detection; Human computer interaction; RECOGNITION; AGREEMENT; USER;
D O I
10.1016/j.specom.2011.01.006
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a technique to enhance emotion detection in spoken dialogue systems by means of two modules that combine different information sources. The first one, called Fusion-0, combines emotion predictions generated by a set of classifiers that deal with different kinds of information about each sentence uttered by the user. To do this, the module employs several methods for information fusion that produce other predictions about the emotional state of the user. The predictions are the input to the second information fusion module, called Fusion-1, where they are combined to deduce the emotional state of the user. Fusion-0 represents a method employed in previous studies to enhance classification rates, whereas Fusion-1 represents the novelty of the technique, which is the combination of emotion predictions generated by Fusion-0. One advantage of the technique is that it can be applied as a posterior processing stage to any other methods that combine information from different information sources at the decision level. This is so because the technique works on the predictions (outputs) of the methods, without interfering in the procedure used to obtain these predictions. Another advantage is that the technique can be implemented as a modular architecture, which facilitates the setting up within a spoken dialogue system as well as the deduction of the emotional state of the user in real time. Experiments have been carried out considering classifiers to deal with prosodic, acoustic, lexical, and dialogue acts information, and three methods to combine information: multiplication of probabilities, average of probabilities, and unweighted vote. The results show that the technique enhances the classification rates of the standard fusion by 2.27% and 3.38% absolute in experiments carried out considering two and three emotion categories, respectively. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:1210 / 1228
页数:19
相关论文
共 44 条
  • [21] Robust analysis of spoken input combining statistical and knowledge-based information sources
    Cattoni, R
    Federico, M
    Lavie, A
    ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 347 - 350
  • [22] Combining Chinese Spoken Term Detection Systems via Side-information Conditioned Linear Logistic Regression
    Meng, Sha
    Zhang, Wei-Qiang
    Liu, Jia
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 685 - 688
  • [23] COMBINING TEMPORAL AND SPECTRAL INFORMATION FOR QUERY-BY-EXAMPLE SPOKEN TERM DETECTION
    Gracia, Ciro
    Anguera, Xavier
    Binefa, Xavier
    2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 1487 - 1491
  • [24] Emotion Analysis and Dialogue Breakdown Detection in Dialogue of Chat Systems Based on Deep Neural Networks
    Matsumoto, Kazuyuki
    Sasayama, Manabu
    Yoshida, Minoru
    Kita, Kenji
    Ren, Fuji
    ELECTRONICS, 2022, 11 (05)
  • [25] Combining Several User Models to Improve and Adapt the Dialog Management Process in Spoken Dialog Systems
    Griol, David
    Manuel Molina, Jose
    Sanchis, Araceli
    Callejas, Zoraida
    FUTURE AND EMERGENT TRENDS IN LANGUAGE TECHNOLOGY, FETLT 2015, 2016, 9577 : 65 - 76
  • [26] A methodology for turn-taking capabilities enhancement in Spoken Dialogue Systems using Reinforcement Learning
    Khouzaimi, Hatim
    Laroche, Romain
    Lefevre, Fabrice
    COMPUTER SPEECH AND LANGUAGE, 2018, 47 : 93 - 111
  • [27] EVALUATION OF A USER-ADAPTED SPOKEN LANGUAGE DIALOGUE SYSTEM Measuring the Relevance of the Contextual Information Sources
    Manuel Lucas-Cuesta, Juan
    Fernandez-Martinez, Fernando
    Dragos Rada, G.
    Lutfi, Syaheerah L.
    Ferreiros, Javier
    ICAART 2011: PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1, 2011, : 218 - 223
  • [28] Dialogue Act Detection in Error-Prone Spoken Dialogue Systems Using Partial Sentence Tree and Latent Dialogue Act Matrix
    Liang, Wei-Bin
    Wu, Chung-Hsien
    Hsiao, Yu-Cheng
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 3038 - 3041
  • [29] COMBINING DEPTH INFORMATION AND LOCAL EDGE DETECTION FOR STEREO IMAGE ENHANCEMENT
    Hachicha, Walid
    Beghdadi, Azeddine
    Cheikh, Faouzi Alaya
    2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 250 - 254
  • [30] Human fall detection algorithm combining information enhancement and feature fusion
    Wang, Fengsui
    Shao, Kaili
    Yang, Haiyan
    Zhongguo Guanxing Jishu Xuebao/Journal of Chinese Inertial Technology, 2024, 32 (08): : 771 - 778