Fractal-Based Speech Analysis for Emotional Content Estimation

被引：0

作者：

Abrol, Akshita ^{[1
]}

Kapoor, Nisha ^{[2
]}

Lehana, Parveen Kumar ^{[1
]}

机构：

[1] Univ Jammu, Dept Elect, DSP Lab, Jammu 180006, Jammu & Kashmir, India

[2] Univ Jammu, Sch Biotechnol, Jammu 180006, Jammu & Kashmir, India

来源：

CIRCUITS SYSTEMS AND SIGNAL PROCESSING | 2021年 / 40卷 / 11期

关键词：

Speech emotion recognition; Emotion estimation; Fractal analysis; Katz algorithm; Fractal dimension; GMM; DIMENSION; FEATURES; MODEL;

D O I：

10.1007/s00034-021-01737-2

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Speech emotional content estimation is still a challenge for building robust human-machine interaction systems. Accuracy of emotion estimation depends upon the corpus used for training and the acoustic features employed for modelling the speech signal. Generally, emotion estimation is computationally expensive, and hence, there is a need of developing alternative techniques. In this paper, a low complexity fractal-based technique has been explored. Our hypothesis is that fractal analysis would provide better emotional content estimation because of the nonlinear nature of the speech signals. Fractal analysis involves two important parameters, i.e. fractal dimension and loop area. Fractal dimension has been computed using the Katz algorithm. The investigations using a GMM-based model show that the proposed technique is capable of identifying the emotional content within the given speech signals reliably and accurately. Further, the technique is robust in the sense that it can bear the noise level in the signal up to 10 dB. The analysis also shows that the technique is gender insensitive. The scope of the investigations presented here is limited to phonemic-level analysis, although the technique works efficiently with speech phrases as well.

引用

页码：5632 / 5653

页数：22

共 40 条

[1] Innovative Method for Unsupervised Voice Activity Detection and Classification of Audio Segments
Ali, Zulfiqar
Talha, Muhammad
[J]. IEEE ACCESS, 2018, 6 : 15494 - 15504
[2] Baljekar PN, 2012, INT CONF ACOUST SPEE, P4461, DOI 10.1109/ICASSP.2012.6288910
[3] Barbulescu Alina, 2010, Latest Trends on Computers . 14th WSEAS International Conference on Computers (Part of the 14th WSEAS CSCC Multiconference), P590
[4] Baudoin G, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P1405, DOI 10.1109/ICSLP.1996.607877
[5] What is wrong in Katz's method? Comments on: "A note on fractal dimensions of biomedical waveforms"
Castiglioni, Paolo
[J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2010, 40 (11-12) : 950 - 952
[6] Chandrasekar P, 2014, 2014 INTERNATIONAL CONFERENCE ON CIRCUITS, SYSTEMS, COMMUNICATION AND INFORMATION TECHNOLOGY APPLICATIONS (CSCITA), P341, DOI 10.1109/CSCITA.2014.6839284
[7] Emotion Communication System
Chen, Min
Zhou, Ping
Fortino, Giancarlo
[J]. IEEE ACCESS, 2017, 5 : 326 - 337
[8] 3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition
Chen, Mingyi
He, Xuanji
Yang, Jing
Zhang, Han
[J]. IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (10) : 1440 - 1444
[9] Dupuis Kate, 2010, Toronto emotional speech set (tess)-younger talkerhappy
[10] Survey on speech emotion recognition: Features, classification schemes, and databases
El Ayadi, Moataz
Kamel, Mohamed S.
Karray, Fakhri
[J]. PATTERN RECOGNITION, 2011, 44 (03) : 572 - 587

← 1 2 3 4 →