Common latent representation learning for low-resourced spoken language identification

被引:0
作者
Chen, Chen [1 ,2 ]
Bu, Yulin [1 ]
Chen, Yong [1 ]
Chen, Deyun [1 ,2 ]
机构
[1] Harbin Univ Sci & Technol, Sch Comp Sci & Technol, Harbin 150080, Heilongjiang, Peoples R China
[2] Harbin Univ Sci & Technol, Postdoctoral Res Stn Comp Sci & Technol, Harbin 150080, Heilongjiang, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金; 黑龙江省自然科学基金;
关键词
Spoken language identification; Total variability space; I-vector; Common latent representation learning; RECOGNITION; SPEECH;
D O I
10.1007/s11042-023-16865-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The i-vector method is one of the mainstream methods in spoken language identification (SLID). It estimates the total variability space (TVS) to obtain a low-rank representation which can characterize the language, called the i-vector. However, on small-scale datasets, low learning resources can significantly degrade the performance of SLID system. Therefore, it is necessary to improve the performance of SLID system in low-resourced condition. In this paper, we propose a common latent representation learning (CLRL) method to learn the TVS, which introduces prior information to address the lack of information in low-resourced condition. The prior information includes category label and parameter prior hypothesis. The CLRL method is evaluated on the OLR2020 dataset. Compared with other state-of-the-art methods, the CLRL method shows better performance on all datasets of different data scales. Moreover, the CLRL method can effectively improve the performance of the SLID system on low-resourced/small-scale datasets.
引用
收藏
页码:34515 / 34535
页数:21
相关论文
共 50 条
[1]  
Abdurrahman A. I., 2021, Bull. Electr. Eng. Inform., V10, P2237
[2]  
Alam Jahangir, 2021, Speech and Computer: 23rd International Conference, SPECOM 2021, Proceedings. Lecture Notes in Computer Science, Lecture Notes in Artificial Intelligence (12997), P1, DOI 10.1007/978-3-030-87802-3_1
[3]   Spoken Language Identification System Using Convolutional Recurrent Neural Network [J].
Alashban, Adal A. ;
Qamhan, Mustafa A. ;
Meftah, Ali H. ;
Alotaibi, Yousef A. .
APPLIED SCIENCES-BASEL, 2022, 12 (18)
[4]   Grey wolf optimization-extreme learning machine for automatic spoken language identification [J].
Albadr, Musatafa Abbas Abbood ;
Tiun, Sabrina ;
Ayob, Masri ;
Nazri, Mohd Zakree Ahmad ;
AL-Dhief, Fahad Taha .
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (18) :27165-27191
[5]   Mel-Frequency Cepstral Coefficient Features Based on Standard Deviation and Principal Component Analysis for Language Identification Systems [J].
Albadr, Musatafa Abbas Abbood ;
Tiun, Sabrina ;
Ayob, Masri ;
Mohammed, Manal ;
AL-Dhief, Fahad Taha .
COGNITIVE COMPUTATION, 2021, 13 (05) :1136-1153
[6]  
Anjana JS, 2018, 2018 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET)
[7]  
[Anonymous], 2014, OD 2014 SPEAK LANG R, DOI DOI 10.21437/ODYSSEY.2014-16
[8]   Automatic spoken language identification using MFCC based time series features [J].
Biswas, Mainak ;
Rahaman, Saif ;
Ahmadian, Ali ;
Subari, Kamalularifin ;
Singh, Pawan Kumar .
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (07) :9565-9595
[9]  
Cai WC, 2019, INT CONF ACOUST SPEE, P5991, DOI [10.1109/ICASSP.2019.8682386, 10.1109/icassp.2019.8682386]
[10]  
Chen CP, 2019, INT CONF ACOUST SPEE, P6211, DOI [10.1109/ICASSP.2019.8683185, 10.1109/icassp.2019.8683185]