Evaluation of the Effect of Frame Size on Speech Emotion Recognition

被引:0
作者
Ozseven, Turgut [1 ]
机构
[1] Tokat Gaziosmanpasa Univ, Dept Comp Engn, Tokat, Turkey
来源
2018 2ND INTERNATIONAL SYMPOSIUM ON MULTIDISCIPLINARY STUDIES AND INNOVATIVE TECHNOLOGIES (ISMSIT) | 2018年
关键词
framing; frame size; acoustic analysis; speech emotion recognition; FEATURES; CLASSIFICATION; EXPRESSION; SELECTION; SCHEME;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Speech emotion recognition aims to determine the emotional state in people by processing the speech with digital signal processing methods. Speech emotion recognition is influenced by many factors such as the language, the demographic characteristics of the participants, and the signal processing methods used. By pre-processing the speech signals before emotion recognition, the signal is improved and the recognition performance is being increased. The framing involved in preprocessing methods is the division of the speech signal into small pieces. In this study, we investigated the effect of the frame size used in framing on the emotion recognition. In addition, the most appropriate frame size for the most commonly used datasets in the literature has been determined. According to the results obtained, the frame size to be used differs according to the dataset used. Furthermore, even the 1ms difference in frame size causes a change in the rate of emotion recognition.
引用
收藏
页码:18 / 21
页数:4
相关论文
共 51 条
[1]   Spoken emotion recognition using hierarchical classifiers [J].
Albornoz, Enrique M. ;
Milone, Diego H. ;
Rufiner, Hugo L. .
COMPUTER SPEECH AND LANGUAGE, 2011, 25 (03) :556-570
[2]   Boosting selection of speech related features to improve performance of multi-class SVMs in emotion detection [J].
Altun, Halis ;
Polat, Goekhan .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (04) :8197-8203
[3]  
[Anonymous], 2009, SIGKDD Explorations, DOI DOI 10.1145/1656274.1656278
[4]   Whodunnit - Searching for the most important feature types signalling emotion-related user states in speech [J].
Batliner, Anton ;
Steidl, Stefan ;
Schuller, Bjoern ;
Seppi, Dino ;
Vogt, Thurid ;
Wagner, Johannes ;
Devillers, Laurence ;
Vidrascu, Laurence ;
Aharonson, Vered ;
Kessous, Loic ;
Amir, Noam .
COMPUTER SPEECH AND LANGUAGE, 2011, 25 (01) :4-28
[5]  
Boersma P., 2010, PRAAT DOING PHONETIC
[6]  
Burkhardt F., 2005, Interspeech, P1517, DOI DOI 10.21437/INTERSPEECH.2005-446
[7]   Interrelation between speech and facial gestures in emotional utterances: A single subject study [J].
Busso, Carlos ;
Narayanan, Shrikanth S. .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (08) :2331-2347
[8]   Iterative Feature Normalization Scheme for Automatic Emotion Detection from Speech [J].
Busso, Carlos ;
Mariooryad, Soroosh ;
Metallinou, Angeliki ;
Narayanan, Shrikanth .
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2013, 4 (04) :386-397
[9]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[10]   Mandarin emotion recognition combining acoustic and emotional point information [J].
Chen, Lijiang ;
Mao, Xia ;
Wei, Pengfei ;
Xue, Yuli ;
Ishizuka, Mitsuru .
APPLIED INTELLIGENCE, 2012, 37 (04) :602-612