Multi-Resolution Feature Extraction Algorithm in Emotional Speech Recognition

被引:2
作者
Zelenik, Ales [1 ]
Kacic, Zdravko [2 ]
机构
[1] NXP Semicond Gratkorn GmbH, A-8101 Gratkorn, Austria
[2] Fac Elect Engn & Comp Sci, Maribor 2000, Slovenia
关键词
Speech; emotion recognition; segmentation; multi-resolution;
D O I
10.5755/j01.eee.21.5.13328
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper a new approach for recognizing emotional speech from audio recordings is presented. In order to obtain the optimum processing window width for feature extraction and to achieve the highest level of recognition rates, a trade-off between time and frequency resolution must be made. At this point, we define a new procedure that combines the advantages of narrower and wider windows and takes advantage of dynamic adjustment of the time and frequency resolution of individual feature characteristics. To achieve higher recognition rates two major procedures are added to the multi-resolution feature-extraction concept, one being the exclusion of features calculated on different processing window widths and the other the idea to use only the parts of recordings with most explicit emotions. To confirm the benefits of the algorithm the audio recordings from the emotional speech database Interface along with four different classifiers were used in evaluation. The highest level of emotion recognition rate with multi-resolution approach exceeded the recognition rate of the best single-resolution approach by 3.5 % with the average improvement of 1.5 % in absolute terms.
引用
收藏
页码:54 / 58
页数:5
相关论文
共 50 条
[41]   A GPU-based multi-resolution algorithm for simulation of seed dispersal [J].
Jing Fan ;
Hai-feng Ji ;
Xin-xin Guan ;
Ying Tang .
Journal of Zhejiang University SCIENCE C, 2012, 13 :816-827
[42]   Development of a watershed algorithm for multi-resolution, multi-dimensional clustering of hyperspectral data [J].
Jellison, GP ;
Hemmer, TH ;
Wilson, DG .
ALGORITHMS AND TECHNOLOGIES FOR MULTISPECTRAL, HYPERSPECTRAL, AND ULTRASPECTRAL IMAGERY VIII, 2002, 4725 :290-301
[43]   A Pattern Mining Approach in Feature Extraction for Emotion Recognition from Speech [J].
Avci, Umut ;
Akkurt, Gamze ;
Unay, Devrim .
SPEECH AND COMPUTER, SPECOM 2019, 2019, 11658 :54-63
[44]   Study of prosodic feature extraction for multidialectal Odia speech emotion recognition [J].
Swain, Monorama ;
Routray, Aurobinda ;
Kabisatpathy, P. ;
Kundu, Jogendra N. .
PROCEEDINGS OF THE 2016 IEEE REGION 10 CONFERENCE (TENCON), 2016, :1644-1649
[46]   Speech emotion recognition based on formant characteristics feature extraction and phoneme type convergence q [J].
Liu, Zhen-Tao ;
Rehman, Abdul ;
Wu, Min ;
Cao, Wei-Hua ;
Hao, Man .
INFORMATION SCIENCES, 2021, 563 :309-325
[47]   Multi-resolution MPS method [J].
Tanaka, Masayuki ;
Cardoso, Rui ;
Bahai, Hamid .
JOURNAL OF COMPUTATIONAL PHYSICS, 2018, 359 :106-136
[48]   Multi-Resolution Mechanism for SVG [J].
Li, Dong ;
Deng, Linsheng .
2009 ASIA-PACIFIC CONFERENCE ON INFORMATION PROCESSING (APCIP 2009), VOL 2, PROCEEDINGS, 2009, :139-+
[49]   Emotional Speech Recognition Based on the Committee of Classifiers [J].
Kaminska, Dorota .
ENTROPY, 2019, 21 (10)
[50]   Inverse Representation Inspired Multi-Resolution Dictionary Learning Method for Face Recognition [J].
Yan, Chunman ;
Zhang, Yuyao .
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (07)