Deep features-based speech emotion recognition for smart affective services

被引:116
作者
Badshah, Abdul Malik [1 ]
Rahim, Nasir [1 ]
Ullah, Noor [1 ]
Ahmad, Jamil [1 ]
Muhammad, Khan [1 ]
Lee, Mi Young [1 ]
Kwon, Soonil [1 ]
Baik, Sung Wook [1 ]
机构
[1] Sejong Univ, Digital Contents Res Inst, Seoul, South Korea
关键词
Speech emotion recognition; Convolutional neural network; Spectrogram; Rectangular kernels; SCHEMES; CLASSIFICATION;
D O I
10.1007/s11042-017-5292-7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Emotion recognition from speech signals is an interesting research with several applications like smart healthcare, autonomous voice response systems, assessing situational seriousness by caller affective state analysis in emergency centers, and other smart affective services. In this paper, we present a study of speech emotion recognition based on the features extracted from spectrograms using a deep convolutional neural network (CNN) with rectangular kernels. Typically, CNNs have square shaped kernels and pooling operators at various layers, which are suited for 2D image data. However, in case of spectrograms, the information is encoded in a slightly different manner. Time is represented along the x-axis and y-axis shows frequency of the speech signal, whereas, the amplitude is indicated by the intensity value in the spectrogram at a particular position. To analyze speech through spectrograms, we propose rectangular kernels of varying shapes and sizes, along with max pooling in rectangular neighborhoods, to extract discriminative features. The proposed scheme effectively learns discriminative features from speech spectrograms and performs better than many state-of-the-art techniques when evaluated its performance on Emo-DB and Korean speech dataset.
引用
收藏
页码:5571 / 5589
页数:19
相关论文
共 54 条
[1]   Microscopic modeling of large-scale pedestrian-vehicle conflicts in the city of Madinah, Saudi Arabia [J].
Abdelgawad, Hossam ;
Shalaby, Amer ;
Abdulhai, Baher ;
Gutub, Adnan Abdul-Aziz .
JOURNAL OF ADVANCED TRANSPORTATION, 2014, 48 (06) :507-525
[2]  
Ahmad Jamil, 2016, 2016 International Conference on Platform Technology and Service (PlatCon). Proceedings, P1, DOI 10.1109/PlatCon.2016.7456788
[3]   Determining speaker attributes from stress-affected speech in emergency situations with hybrid SVM-DNN architecture [J].
Ahmad, Jamil ;
Sajjad, Muhammad ;
Rho, Seungmin ;
Kwon, Soon-il ;
Lee, Mi Young ;
Baik, Sung Wook .
MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (04) :4883-4907
[4]   Information Gathering Schemes For Collaborative Sensor Devices [J].
Aly, Salah A. ;
AlGhamdi, Turki A. ;
Salim, Mohamed ;
Amin, Hesham H. ;
Gutub, Adnan A. .
5TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT-2014), THE 4TH INTERNATIONAL CONFERENCE ON SUSTAINABLE ENERGY INFORMATION TECHNOLOGY (SEIT-2014), 2014, 32 :1141-1146
[5]  
[Anonymous], TMM
[6]  
[Anonymous], EUROSPEECH
[7]  
[Anonymous], TECHNOL
[8]  
[Anonymous], 2012, WIT Transactions on The Built Environment
[9]  
[Anonymous], 2010, 2010 IEEE 39 APPL IM, DOI DOI 10.1109/AIPR.2010.5759701
[10]  
[Anonymous], PLATFORM TECHNOLOGY