Common latent representation learning for low-resourced spoken language identification

被引：0

作者：

Chen, Chen ^{[1
,2
]}

Bu, Yulin ^{[1
]}

Chen, Yong ^{[1
]}

Chen, Deyun ^{[1
,2
]}

机构：

[1] Harbin Univ Sci & Technol, Sch Comp Sci & Technol, Harbin 150080, Heilongjiang, Peoples R China

[2] Harbin Univ Sci & Technol, Postdoctoral Res Stn Comp Sci & Technol, Harbin 150080, Heilongjiang, Peoples R China

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2023年 / 83卷 / 12期

基金：

中国博士后科学基金; 中国国家自然科学基金; 黑龙江省自然科学基金;

关键词：

Spoken language identification; Total variability space; I-vector; Common latent representation learning; RECOGNITION; SPEECH;

D O I：

10.1007/s11042-023-16865-x

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The i-vector method is one of the mainstream methods in spoken language identification (SLID). It estimates the total variability space (TVS) to obtain a low-rank representation which can characterize the language, called the i-vector. However, on small-scale datasets, low learning resources can significantly degrade the performance of SLID system. Therefore, it is necessary to improve the performance of SLID system in low-resourced condition. In this paper, we propose a common latent representation learning (CLRL) method to learn the TVS, which introduces prior information to address the lack of information in low-resourced condition. The prior information includes category label and parameter prior hypothesis. The CLRL method is evaluated on the OLR2020 dataset. Compared with other state-of-the-art methods, the CLRL method shows better performance on all datasets of different data scales. Moreover, the CLRL method can effectively improve the performance of the SLID system on low-resourced/small-scale datasets.

引用

页码：34515 / 34535

页数：21

共 50 条

[1]

Abdurrahman A. I., 2021, Bull. Electr. Eng. Inform., V10, P2237

[2]

Alam Jahangir, 2021, Speech and Computer: 23rd International Conference, SPECOM 2021, Proceedings. Lecture Notes in Computer Science, Lecture Notes in Artificial Intelligence (12997), P1, DOI 10.1007/978-3-030-87802-3_1

[3] Spoken Language Identification System Using Convolutional Recurrent Neural Network [J].

Alashban, Adal A. ;

Qamhan, Mustafa A. ;

Meftah, Ali H. ;

Alotaibi, Yousef A. .

APPLIED SCIENCES-BASEL, 2022, 12 (18)

[4] Grey wolf optimization-extreme learning machine for automatic spoken language identification [J].

Albadr, Musatafa Abbas Abbood ;

Tiun, Sabrina ;

Ayob, Masri ;

Nazri, Mohd Zakree Ahmad ;

AL-Dhief, Fahad Taha .

MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (18) :27165-27191

[5] Mel-Frequency Cepstral Coefficient Features Based on Standard Deviation and Principal Component Analysis for Language Identification Systems [J].

Albadr, Musatafa Abbas Abbood ;

Tiun, Sabrina ;

Ayob, Masri ;

Mohammed, Manal ;

AL-Dhief, Fahad Taha .

COGNITIVE COMPUTATION, 2021, 13 (05) :1136-1153

[6]

Anjana JS, 2018, 2018 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET)

[7]

[Anonymous], 2014, OD 2014 SPEAK LANG R, DOI DOI 10.21437/ODYSSEY.2014-16

[8] Automatic spoken language identification using MFCC based time series features [J].

Biswas, Mainak ;

Rahaman, Saif ;

Ahmadian, Ali ;

Subari, Kamalularifin ;

Singh, Pawan Kumar .

MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (07) :9565-9595

[9]

Cai WC, 2019, INT CONF ACOUST SPEE, P5991, DOI [10.1109/ICASSP.2019.8682386, 10.1109/icassp.2019.8682386]

[10]

Chen CP, 2019, INT CONF ACOUST SPEE, P6211, DOI [10.1109/ICASSP.2019.8683185, 10.1109/icassp.2019.8683185]

← 1 2 3 4 5 →