Autonomous Emotion Learning in Speech: A View of Zero-Shot Speech Emotion Recognition

被引：11

作者：

Xu, Xinzhou ^{[1
]}

Deng, Jun ^{[2
]}

Cummins, Nicholas ^{[3
]}

Zhang, Zixing ^{[4
]}

Zhao, Li ^{[5
]}

Schuller, Bjorn W. ^{[3
,4
]}

机构：

[1] Nanjing Univ Posts & Telecommun, Coll Internet Things, Nanjing, Peoples R China

[2] Agile Robots AG, Munich, Germany

[3] Univ Augsburg, ZDB Chair Embedded Intelligence Hlth Care & Wellb, Augsburg, Germany

[4] Imperial Coll London, Grp Language Audio & Mus, London, England

[5] Southeast Univ, Sch Informat Sci & Engn, Nanjing, Peoples R China

来源：

INTERSPEECH 2019 | 2019年

关键词：

Autonomous emotion learning; speech emotion recognition; zero-shot learning; emotional attributes; FRAMEWORK;

D O I：

10.21437/Interspeech.2019-2406

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Conventionally, speech emotion recognition is achieved using passive learning approaches. Differing from such approaches, we herein propose and develop a dynamic method of autonomous emotion learning based on zero-shot learning. The proposed methodology employs emotional dimensions as the attributes in the zero-shot learning paradigm, resulting in two phases of learning, namely attribute learning and label learning. Attribute learning connects the paralinguistic features and attributes utilising speech with known emotional labels, while label learning aims at defining unseen emotions through the attributes. The experimental results achieved on the CINEMO corpus indicate that zero-shot learning is a useful technique for autonomous speech-based emotion learning, achieving accuracies considerably better than chance level and an attribute-based gold-standard setup. Furthermore, different emotion recognition tasks, emotional attributes, and employed approaches strongly influence system performance.

引用

页码：949 / 953

页数：5

共 33 条

[1] Voice-to-Affect Mapping: Inferences on Language Voice Baseline Settings [J].

Chasaide, Ailbhe Ni ;

Yanushevskaya, Irena ;

Gobl, Christer .

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :1258-1262

[2]

Cowen A. S., 2018, AM PSYCHOL, P1

[3] Semisupervised Autoencoders for Speech Emotion Recognition [J].

Deng, Jun ;

Xu, Xinzhou ;

Zhang, Zixing ;

Fruehholz, Sascha ;

Schuller, Bjorn .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (01) :31-43

[4]

Eyben F., 2013, P 21 ACM INT C MULT, P835

[5]

Han Wen-jing, 2011, Signal Processing, V27, P1658

[6] Negative emotions in informal feedback: The benefits of disappointment and drawbacks of anger [J].

Johnson, Genevieve ;

Connelly, Shane .

HUMAN RELATIONS, 2014, 67 (10) :1265-1290

[7] Fuzzy Logic Models for the Meaning of Emotion Words [J].

Kazemzadeh, Abe ;

Lee, Sungbok ;

Narayanan, Shrikanth .

IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, 2013, 8 (02) :34-49

[8]

Keltner D., 2016, HDB EMOTIONS, V4th, P467

[9] Neural correlates of positive and negative emotion regulation [J].

Kim, Sang Hee ;

Hamann, Stephan .

JOURNAL OF COGNITIVE NEUROSCIENCE, 2007, 19 (05) :776-798

[10] Attribute-Based Classification for Zero-Shot Visual Object Categorization [J].

Lampert, Christoph H. ;

Nickisch, Hannes ;

Harmeling, Stefan .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (03) :453-465

← 1 2 3 4 →