Spoken Language Identification in Unseen Target Domain Using Centroid Similarity Loss With Adaptive Gradient Blending

被引：0

作者：

Muralikrishna, H. ^{[1
]}

Kumar, Sujeet ^{[2
]}

Dinesh, Dileep Aroor ^{[3
]}

Thenkanidiyoor, Veena ^{[4
]}

机构：

[1] Manipal Acad Higher Educ, Manipal Inst Technol, Dept Elect & Commun Engn, Manipal 576104, India

[2] Indian Inst Technol Mandi, MANAS Lab, Mandi 175075, Himachal Prades, India

[3] Indian Inst Technol Dharwad, Dept Comp Sci & Engn, Dharwad 580011, Karnataka, India

[4] Natl Inst Technol Goa, Dept Comp Sci & Engn, Ponda 403401, India

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Feature extraction; Training; Robustness; Object recognition; Adaptive systems; Natural language processing; Gradient methods; Spoken language identification; unseen target domain; domain-mismatch; adaptive gradient blending; centroid similarity loss; DEEP NEURAL-NETWORKS; RECOGNITION;

D O I：

10.1109/ACCESS.2024.3422380

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we propose a centroid similarity loss (CSL) with adaptive gradient blending (AGB) (denoted as CSL-with-AGB) strategy to improve the generalization of a spoken language identification (LID) system to unseen target domain conditions. Unlike most of the existing approaches, the proposed CSL-with-AGB can improve the generalization even when the training dataset lacks domain-diversity. Specifically, in this approach, the LID network first analyses the input at two different temporal resolutions using a set of two embedding extractors, which allow them to generalize better by encoding complementary contents. We then propose to use the CSL to further improve the generalization of the network by encouraging the embedding extractors to learn discriminative and domain-invariant embeddings. However, application of auxiliary loss like CSL can sometimes force the two embedding extractors of the network to learn in an unbalanced way, diminishing their ability to encode complementary contents in the input. To overcome this issue, we propose to include the AGB strategy with the CSL. With the help of two auxiliary classifiers attached to the two embedding extractors, the AGB monitors and guides them to have a balanced learning, leading to enhanced performance in unseen target domain conditions.

引用

页码：95959 / 95971

页数：13

共 39 条

[11] SphereFace: Deep Hypersphere Embedding for Face Recognition [J].

Liu, Weiyang ;

Wen, Yandong ;

Yu, Zhiding ;

Li, Ming ;

Raj, Bhiksha ;

Song, Le .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6738-6746

[12]

Lozano-Diez A, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5184, DOI 10.1109/ICASSP.2018.8462403

[13] Class-wise Centroid Distance Metric Learning for Acoustic Event Detection [J].

Lu, Xugang ;

Shen, Peng ;

Li, Sheng ;

Tsao, Yu ;

Kawai, Hisashi .

INTERSPEECH 2019, 2019, :3614-3618

[14] Using Deep Neural Networks for Identification of Slavic Languages from Acoustic Signal [J].

Mateju, Lukas ;

Cerva, Petr ;

Zdansky, Jindrich ;

Safarik, Radek .

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :1803-1807

[15]

Mclaren M., 2018, P ODYSSEY, P90

[16]

Meng Z, 2019, INT CONF ACOUST SPEE, P6216, DOI [10.1109/ICASSP.2019.8682488, 10.1109/icassp.2019.8682488]

[17] An Investigation of Deep Neural Network Architectures for Language Recognition in Indian Languages [J].

Mounika, K., V ;

Achanta, Sivanand ;

Lakshmi, H. R. ;

Gangashetty, Suryakanth V. ;

Vuppala, Anil Kumar .

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :2930-2933

[18] Spoken language identification in unseen channel conditions using modified within-sample similarity loss [J].

Muralikrishna, H. ;

Dinesh, Dileep Aroor .

PATTERN RECOGNITION LETTERS, 2022, 158 :16-23

[19] SPOKEN LANGUAGE IDENTIFICATION IN UNSEEN TARGET DOMAIN USING WITHIN-SAMPLE SIMILARITY LOSS [J].

Muralikrishna, H. ;

Kapoor, Shantanu ;

Dinesh, Dileep Aroor ;

Rajan, Padmanabhan .

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :7218-7222

[20]

Muralikrishna H, 2019, 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), P320, DOI [10.1109/ASRU46091.2019.9003947, 10.1109/asru46091.2019.9003947]

← 1 2 3 4 →