Significance of Phonological Features in Speech Emotion Recognition

被引：8

作者：

Wang, Wei ^{[1
]}

Watters, Paul A. ^{[2
]}

Cao, Xinyi ^{[1
]}

Shen, Lingjie ^{[1
]}

Li, Bo ^{[3
]}

机构：

[1] Nanjing Normal Univ, Sch Educ Sci, Nanjing 210097, JS, Peoples R China

[2] La Trobe Univ, Dept Comp Sci & Informat Technol, Melbourne, Vic 3350, Australia

[3] Univ Southern Mississippi, Sch Comp Sci & Comp Engn, 730 East Beach Blvd, Long Beach, MS 39560 USA

来源：

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY | 2020年 / 23卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Speech emotion recognition; Phonological features; Feature analysis; Acoustic features;

D O I：

10.1007/s10772-020-09734-7

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

A novel Speech Emotion Recognition (SER) method based on phonological features is proposed in this paper. Intuitively, as expert knowledge derived from linguistics, phonological features are correlated with emotions. However, it has been found that they are seldomly used as features to improve SER. Motivated by this, we set our goal to utilize phonological features to further advance SER's accuracy since they can provide complementary information for the task. Furthermore, we will also explore the relationship between phonological features and emotions. Firstly, instead of only based on acoustic features, we devise a new SER approach by fusing phonological representations and acoustic features together. A significant improvement in SER performance has been demonstrated on a publicly available SER database named Interactive Emotional Dyadic Motion Capture (IEMOCAP). Secondly, the experimental results show that the top-performing method for the task of categorical emotion recognition is a deep learning-based classifier which generates an unweighted average recall (UAR) accuracy of 60.02%. Finally, we investigate the most discriminative features and find some patterns of emotional rhyme based on the phonological representations.

引用

页码：633 / 642

页数：10

共 25 条

[21] Acoustic Emotion Recognition: A Benchmark Comparison of Performances [J].

Schuller, Bjoern ;

Vlasenko, Bogdan ;

Eyben, Florian ;

Rigoll, Gerhard ;

Wendemuth, Andreas .

2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, :552-+

[22]

Schuller B, 2009, INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, P336

[23]

Schuller Bjorn, 2010, PROC INTERSPEECH 201, P2794

[24]

Shah M, 2014, IEEE INT SYMP CIRC S, P754, DOI 10.1109/ISCAS.2014.6865245

[25] Multilayer-Folded Graphene Ribbon Film with Ultrahigh Areal Capacitance and High Rate Performance for Compressible Supercapacitors [J].

Sheng, Lizhi ;

Chang, Jin ;

Jiang, Lili ;

Jiang, Zimu ;

Liu, Zheng ;

Wei, Tong ;

Fan, Zhuangjun .

ADVANCED FUNCTIONAL MATERIALS, 2018, 28 (21)

← 1 2 3 →