Common latent representation learning for low-resourced spoken language identification

被引：0

作者：

Chen Chen

Yulin Bu

Yong Chen

Deyun Chen

机构：

[1] Harbin University of Science and Technology,School of Computer Science and Technology

[2] Harbin University of Science and Technology,Postdoctoral Research Station of Computer Science and Technology

来源：

Multimedia Tools and Applications | 2024年 / 83卷

关键词：

Spoken language identification; Total variability space; I-vector; Common latent representation learning;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

The i-vector method is one of the mainstream methods in spoken language identification (SLID). It estimates the total variability space (TVS) to obtain a low-rank representation which can characterize the language, called the i-vector. However, on small-scale datasets, low learning resources can significantly degrade the performance of SLID system. Therefore, it is necessary to improve the performance of SLID system in low-resourced condition. In this paper, we propose a common latent representation learning (CLRL) method to learn the TVS, which introduces prior information to address the lack of information in low-resourced condition. The prior information includes category label and parameter prior hypothesis. The CLRL method is evaluated on the OLR2020 dataset. Compared with other state-of-the-art methods, the CLRL method shows better performance on all datasets of different data scales. Moreover, the CLRL method can effectively improve the performance of the SLID system on low-resourced/small-scale datasets.

引用

页码：34515 / 34535

页数：20

共 43 条

[1]

Dehak N(2010)Front-end factor analysis for speaker verification IEEE Trans Audio, Speech, Lang Process 19 788-798

[2]

Kenny PJ(2021)Mel-frequency cepstral coefficient features based on standard deviation and principal component analysis for language identification systems Cogn Comput 13 1136-1153

[3]

Dehak R(2018)Generalized variability model for speaker verification IEEE Sig Process Lett 25 1775-1779

[4]

Albadr MAA(2022)A review into deep learning techniques for spoken language identification Multimed Tool Appl 81 32593-32624

[5]

Tiun S(2022)Multi-level self-attentive TDNN: A general and efficient approach to summarize speech into discriminative utterance-level representations Speech Commun 140 42-49

[6]

Ayob M(2022)Efficient self-supervised learning representations for spoken language identification IEEE J Sel Top Sig Process 16 1296-1307

[7]

Ma J(2017)LID-senones and their statistics for language identification IEEE/ACM Trans Aud, Speech, Lang Process 26 171-183

[8]

Sethu V(2020)A new time-frequency attention tensor network for language identification Circuits, Systems, and Signal Processing 39 2744-2758

[9]

Ambikairajah E(1993)Automatic language identification using Gaussian mixture and hidden Markov models. IEEE Int Conf Acoust, Speech Sig Process. IEEE 2 399-402

[10]

Thukroo IA(2022)Multi-level self-attentive TDNN: A general and efficient approach to summarize speech into discriminative utterance-level representations Speech Commun 140 42-49

← 1 2 3 4 5 →