Common latent representation learning for low-resourced spoken language identification

被引:0
作者
Chen Chen
Yulin Bu
Yong Chen
Deyun Chen
机构
[1] Harbin University of Science and Technology,School of Computer Science and Technology
[2] Harbin University of Science and Technology,Postdoctoral Research Station of Computer Science and Technology
来源
Multimedia Tools and Applications | 2024年 / 83卷
关键词
Spoken language identification; Total variability space; I-vector; Common latent representation learning;
D O I
暂无
中图分类号
学科分类号
摘要
The i-vector method is one of the mainstream methods in spoken language identification (SLID). It estimates the total variability space (TVS) to obtain a low-rank representation which can characterize the language, called the i-vector. However, on small-scale datasets, low learning resources can significantly degrade the performance of SLID system. Therefore, it is necessary to improve the performance of SLID system in low-resourced condition. In this paper, we propose a common latent representation learning (CLRL) method to learn the TVS, which introduces prior information to address the lack of information in low-resourced condition. The prior information includes category label and parameter prior hypothesis. The CLRL method is evaluated on the OLR2020 dataset. Compared with other state-of-the-art methods, the CLRL method shows better performance on all datasets of different data scales. Moreover, the CLRL method can effectively improve the performance of the SLID system on low-resourced/small-scale datasets.
引用
收藏
页码:34515 / 34535
页数:20
相关论文
共 43 条
[1]  
Dehak N(2010)Front-end factor analysis for speaker verification IEEE Trans Audio, Speech, Lang Process 19 788-798
[2]  
Kenny PJ(2021)Mel-frequency cepstral coefficient features based on standard deviation and principal component analysis for language identification systems Cogn Comput 13 1136-1153
[3]  
Dehak R(2018)Generalized variability model for speaker verification IEEE Sig Process Lett 25 1775-1779
[4]  
Albadr MAA(2022)A review into deep learning techniques for spoken language identification Multimed Tool Appl 81 32593-32624
[5]  
Tiun S(2022)Multi-level self-attentive TDNN: A general and efficient approach to summarize speech into discriminative utterance-level representations Speech Commun 140 42-49
[6]  
Ayob M(2022)Efficient self-supervised learning representations for spoken language identification IEEE J Sel Top Sig Process 16 1296-1307
[7]  
Ma J(2017)LID-senones and their statistics for language identification IEEE/ACM Trans Aud, Speech, Lang Process 26 171-183
[8]  
Sethu V(2020)A new time-frequency attention tensor network for language identification Circuits, Systems, and Signal Processing 39 2744-2758
[9]  
Ambikairajah E(1993)Automatic language identification using Gaussian mixture and hidden Markov models. IEEE Int Conf Acoust, Speech Sig Process. IEEE 2 399-402
[10]  
Thukroo IA(2022)Multi-level self-attentive TDNN: A general and efficient approach to summarize speech into discriminative utterance-level representations Speech Commun 140 42-49