Continuous emotion recognition with phonetic syllables

被引:29
作者
Origlia, A. [1 ]
Cutugno, F. [1 ]
Galata, V. [2 ]
机构
[1] Univ Naples Federico II, Dept Elect Engn & Informat Technol DIETI, Language Understanding & Speech Interfaces LUSI L, I-80125 Naples, Italy
[2] CNR, Inst Cognit Sci & Technol ISTC, Padua, Italy
关键词
Affective computing; Feature extraction; Phonetic syllables; Valence-Activation-Dominance space; FUNDAMENTAL-FREQUENCY; SPEECH; MODEL; PERCEPTION; FEATURES; RHYTHM;
D O I
10.1016/j.specom.2013.09.012
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
As research on the extraction of acoustic properties of speech for emotion recognition progresses, the need of investigating methods of feature extraction taking into account the necessities of real time processing systems becomes more important. Past works have shown the importance of syllables for the transmission of emotions, while classical research methods adopted in prosody show that it is important to concentrate on specific areas of the speech signal to study intonation phenomena. Technological approaches, however, are often designed to use the whole speech signal without taking into account the qualitative variability of the spectral content. Given this contrast with the theoretical basis around which prosodic research is pursued, we present here a feature extraction method built on the basis of a phonetic interpretation of the concept of syllable. In particular, we concentrate on the spectral content of syllabic nuclei, thus reducing the amount of information to be processed. Moreover, we introduce feature weighting based on syllabic prominence, thus not considering all the units of analysis as being equally important. The method is evaluated on a continuous, three-dimensional model of emotions built on the classical axes of Valence, Activation and Dominance and is shown to be competitive with state-of-the-art performance. The potential impact of this approach on the design of affective computing systems is also analysed. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:155 / 169
页数:15
相关论文
共 74 条
[1]  
[Anonymous], 2011, P INTERSPEECH
[2]  
[Anonymous], 2003, Proceedings of the 15th international congress of the Phonetic sciences
[3]   Rhythm, Timing and the Timing of Rhythm [J].
Arvaniti, Amalia .
PHONETICA, 2009, 66 (1-2) :46-63
[4]  
Avanzi M., 2010, P SPEECH PROS
[5]  
Baltrusaitis T, 2013, IEEE INT CONF AUTOMA
[6]  
Barry W.J., 2003, P 15 INT C PHONETICS, P2693
[7]   Segmenting into Adequate Units for Automatic Recognition of Emotion-Related Episodes: A Speech-Based Approach [J].
Batliner, Anton ;
Seppi, Dino ;
Steidl, Stefan ;
Schuller, Bjoern .
ADVANCES IN HUMAN-COMPUTER INTERACTION, 2010, 2010
[8]  
Boersma P., 2009, Praat: doing phonetics by computer (version 5.1.13)
[9]  
Boersma P., 1993, Institute of Phonetic Sciences, University of Amsterdam, Proceedings 17 (1993) 97-110, P97
[10]   The contribution of speech rate and pitch variation to the perception of vocal emotions in a German and an American sample [J].
Breitenstein, C ;
Van Lancker, D ;
Daum, I .
COGNITION & EMOTION, 2001, 15 (01) :57-79