Cross lingual speech emotion recognition via triple attentive asymmetric convolutional neural network

被引：23

作者：

Ocquaye, Elias N. N. ^{[1
]}

Mao, Qirong ^{[1
]}

Xue, Yanfei ^{[1
]}

Song, Heping ^{[1
]}

机构：

[1] Jiangsu Univ, Dept Comp Sci & Commun Engn, Zhenjiang 212013, Jiangsu, Peoples R China

来源：

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS | 2021年 / 36卷 / 01期

关键词：

center loss; cross-lingual; domain adaptation; speech emotion recognition; triple attentive asymmetric; DOMAIN ADAPTATION; FEATURES;

D O I：

10.1002/int.22291

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The application of cross-corpus for speech emotion recognition (SER) via domain adaptation methods have gain high acknowledgment for developing good robust emotion recognition systems using different corpora or datasets. However, the issue of cross-lingual still remains a challenge in SER and needs more attention to resolve the scenario of applying different language types in both training and testing. In this paper, we propose a triple attentive asymmetric convolutional neural network to address the recognition of emotions for cross-lingual and cross-corpus speech in an unsupervised approach. The proposed method adopts the joint supervision of softmax loss and center loss to learn high power discriminative feature representations for target domain via the use of high quality pseudo-labels. The proposed model uses three attentive convolutional neural networks asymmetrically, where two of the networks are used to artificially label unlabeled target samples as a result of their predictions from training on source labeled samples and the other network is used to obtain salient target discriminative features from the pseudo-labeled target samples. We evaluate our proposed method on three different language types (i.e., English, German, and Italian) data sets. The experimental results indicate that, our proposed method achieves higher prediction accuracy over other state-of-the-art methods.

引用

页码：53 / 71

页数：19

共 66 条

[1] Domain Adversarial for Acoustic Emotion Recognition
Abdelwahab, Mohammed
Busso, Carlos
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (12) : 2423 - 2435
[2] Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011
Anagnostopoulos, Christos-Nikolaos
Iliou, Theodoros
Giannoukos, Ioannis
[J]. ARTIFICIAL INTELLIGENCE REVIEW, 2015, 43 (02) : 155 - 177
[3] [Anonymous], 2016, ARXIV160101577
[4] Badshah AM, 2017, 2017 INTERNATIONAL CONFERENCE ON PLATFORM TECHNOLOGY AND SERVICE (PLATCON), P125
[5] Batliner A, 2011, COGN TECHNOL, P71, DOI 10.1007/978-3-642-15184-2_6
[6] A theory of learning from different domains
Ben-David, Shai
Blitzer, John
Crammer, Koby
Kulesza, Alex
Pereira, Fernando
Vaughan, Jennifer Wortman
[J]. MACHINE LEARNING, 2010, 79 (1-2) : 151 - 175
[7] Learning Deep Architectures for AI
Bengio, Yoshua
[J]. FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2009, 2 (01): : 1 - 127
[8] A bidirectional trace simplification approach based on a context switch linked list for concurrent programs
Bo, Lili
Jiang, Shujuan
Wang, Rongcun
Yu, Qiao
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (02)
[9] Burkhardt F., 2005, INTERSPEECH, P1517
[10] IEMOCAP: interactive emotional dyadic motion capture database
Busso, Carlos
Bulut, Murtaza
Lee, Chi-Chun
Kazemzadeh, Abe
Mower, Emily
Kim, Samuel
Chang, Jeannette N.
Lee, Sungbok
Narayanan, Shrikanth S.
[J]. LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) : 335 - 359

← 1 2 3 4 5 6 7 →