Hierarchical sparse coding framework for speech emotion recognition

被引:15
作者
Torres-Boza, Diana [1 ]
Oveneke, Meshia Cedric [1 ]
Wang, Fengna [1 ]
Jiang, Dongmei [2 ]
Verhelst, Werner [1 ]
Sahli, Hichem [1 ,3 ]
机构
[1] VUB, Dept Elect & Informat ETRO, VUB NPU Joint Audiovisual Signal Proc AVSP Res La, Pleinlaan 2, B-1050 Brussels, Belgium
[2] NPU, Shaanxi Key Lab Speech & Image Informat Proc, VUB NPU Joint Audiovisual Signal Proc AVSP Res La, Youyo Xilu 127, Xian 710072, Shaanxi, Peoples R China
[3] Interuniv Microelect Ctr IMEC, Kapeldreef 75, B-3001 Heverlee, Belgium
基金
中国国家自然科学基金;
关键词
Affective computing; Speech emotion recognition; Sparse coding; Support vector regression; CLASSIFICATION; FEATURES; AUDIO; MODEL;
D O I
10.1016/j.specom.2018.01.006
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Finding an appropriate feature representation for audio data is central to speech emotion recognition. Most existing audio features rely on hand-crafted feature encoding techniques, such as the AVEC challenge feature set. An alternative approach is to use features that are learned automatically. This has the advantage of generalizing well to new data, particularly if the features are learned in an unsupervised manner with less restrictions on the data itself. In this work, we adopt the sparse coding framework as a means to automatically represent features from audio and propose a hierarchical sparse coding (HSC) scheme. Experimental results indicate that the obtained features, in an unsupervised fashion, are able to capture useful properties of the speech that distinguish between emotions.
引用
收藏
页码:80 / 89
页数:10
相关论文
共 89 条
[1]  
Aher P., 2014, AUDITORY PROCESSING
[2]  
American Standards Association Acoustical Society of America, 1960, AM STAND AC TERM INC
[3]  
[Anonymous], 2010, P ADV NEUR INF PROC
[4]  
[Anonymous], 2009, THESIS
[5]  
[Anonymous], 2014, TRAINING
[6]  
[Anonymous], 2011, P 28 INT C MACHINE L
[7]  
[Anonymous], ARTIFICIAL INTELLIGE
[8]  
AURES W, 1985, ACUSTICA, V58, P268
[9]   Representation Learning: A Review and New Perspectives [J].
Bengio, Yoshua ;
Courville, Aaron ;
Vincent, Pascal .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828
[10]  
Boersma P., 1993, Institute of Phonetic Sciences, University of Amsterdam, Proceedings 17 (1993) 97-110, P97