Common latent representation learning for low-resourced spoken language identification

被引:0
作者
Chen, Chen [1 ,2 ]
Bu, Yulin [1 ]
Chen, Yong [1 ]
Chen, Deyun [1 ,2 ]
机构
[1] Harbin Univ Sci & Technol, Sch Comp Sci & Technol, Harbin 150080, Heilongjiang, Peoples R China
[2] Harbin Univ Sci & Technol, Postdoctoral Res Stn Comp Sci & Technol, Harbin 150080, Heilongjiang, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金; 黑龙江省自然科学基金;
关键词
Spoken language identification; Total variability space; I-vector; Common latent representation learning; RECOGNITION; SPEECH;
D O I
10.1007/s11042-023-16865-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The i-vector method is one of the mainstream methods in spoken language identification (SLID). It estimates the total variability space (TVS) to obtain a low-rank representation which can characterize the language, called the i-vector. However, on small-scale datasets, low learning resources can significantly degrade the performance of SLID system. Therefore, it is necessary to improve the performance of SLID system in low-resourced condition. In this paper, we propose a common latent representation learning (CLRL) method to learn the TVS, which introduces prior information to address the lack of information in low-resourced condition. The prior information includes category label and parameter prior hypothesis. The CLRL method is evaluated on the OLR2020 dataset. Compared with other state-of-the-art methods, the CLRL method shows better performance on all datasets of different data scales. Moreover, the CLRL method can effectively improve the performance of the SLID system on low-resourced/small-scale datasets.
引用
收藏
页码:34515 / 34535
页数:21
相关论文
共 50 条
  • [1] Abdurrahman A. I., 2021, Bull. Electr. Eng. Inform., V10, P2237
  • [2] Alam Jahangir, 2021, Speech and Computer: 23rd International Conference, SPECOM 2021, Proceedings. Lecture Notes in Computer Science, Lecture Notes in Artificial Intelligence (12997), P1, DOI 10.1007/978-3-030-87802-3_1
  • [3] Spoken Language Identification System Using Convolutional Recurrent Neural Network
    Alashban, Adal A.
    Qamhan, Mustafa A.
    Meftah, Ali H.
    Alotaibi, Yousef A.
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (18):
  • [4] Grey wolf optimization-extreme learning machine for automatic spoken language identification
    Albadr, Musatafa Abbas Abbood
    Tiun, Sabrina
    Ayob, Masri
    Nazri, Mohd Zakree Ahmad
    AL-Dhief, Fahad Taha
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (18) : 27165 - 27191
  • [5] Mel-Frequency Cepstral Coefficient Features Based on Standard Deviation and Principal Component Analysis for Language Identification Systems
    Albadr, Musatafa Abbas Abbood
    Tiun, Sabrina
    Ayob, Masri
    Mohammed, Manal
    AL-Dhief, Fahad Taha
    [J]. COGNITIVE COMPUTATION, 2021, 13 (05) : 1136 - 1153
  • [6] Anjana JS, 2018, 2018 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET)
  • [7] [Anonymous], 2014, OD 2014 SPEAK LANG R, DOI DOI 10.21437/ODYSSEY.2014-16
  • [8] Automatic spoken language identification using MFCC based time series features
    Biswas, Mainak
    Rahaman, Saif
    Ahmadian, Ali
    Subari, Kamalularifin
    Singh, Pawan Kumar
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (07) : 9565 - 9595
  • [9] Cai WC, 2019, INT CONF ACOUST SPEE, P5991, DOI [10.1109/ICASSP.2019.8682386, 10.1109/icassp.2019.8682386]
  • [10] Chen CP, 2019, INT CONF ACOUST SPEE, P6211, DOI [10.1109/ICASSP.2019.8683185, 10.1109/icassp.2019.8683185]