LABERT: A Combination of Local Aggregation and Self-Supervised Speech Representation Learning for Detecting Informative Hidden Units in Low-Resource ASR Systems

被引：1

作者：

Fatehi, Kavan ^{[1
]}

Kucukyilmaz, Ayse ^{[1
]}

机构：

[1] Univ Nottingham, Sch Comp Sci, Nottingham, England

来源：

INTERSPEECH 2023 | 2023年

关键词：

Self-Supervised Learning; BERT; Local Aggregation Function; Low-Resource Environment ASR;

D O I：

10.21437/Interspeech.2023-2001

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

With advances in deep learning methodologies, Automatic Speech Recognition (ASR) systems have seen impressive results. However, ASR in Low-Resource Environments (LREs) are challenged by a lack of training data for the specific target domain. We propose that data sampling criteria for choosing more informative speech samples can be critical to addressing the problem of training data bottleneck. Our proposed Local Aggregation BERT (LABERT) method for self-supervised speech representation learning fuses an active learning model with an adapted local aggregation metric. Active learning is used to pick informative speech units, whereas the aggregation metric forces the model to move similar data together in the latent space while separating dissimilar instances to detect hidden units in LRE tasks. We evaluate LABERT with two LRE datasets: I-CUBE and UASpeech to explore the performance of our model in the LRE ASR problems.

引用

页码：211 / 215

页数：5

共 27 条

[1]

Ardila R, 2020, PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), P4218

[2]

Baevski A, 2020, ADV NEUR IN, V33

[3]

Baevski A, 2020, Arxiv, DOI arXiv:1911.03912

[4]

Baevski Alexei, 2021, Advances in Neural Information Processing Systems, V34

[5]

Caron M, 2020, ADV NEUR IN, V33

[6] Deep Clustering for Unsupervised Learning of Visual Features [J].

Caron, Mathilde ;

Bojanowski, Piotr ;

Joulin, Armand ;

Douze, Matthijs .

COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 :139-156

[7]

Chiu C.-C., 2022, PR MACH LEARN RES, P3915

[8] W2V-BERT: COMBINING CONTRASTIVE LEARNING AND MASKED LANGUAGE MODELING FOR SELF-SUPERVISED SPEECH PRE-TRAINING [J].

Chung, Yu-An ;

Zhang, Yu ;

Han, Wei ;

Chiu, Chung-Cheng ;

Qin, James ;

Pang, Ruoming ;

Wu, Yonghui .

2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, :244-250

[9] WIZARD OF OZ STUDIES - WHY AND HOW [J].

DAHLBACK, N ;

JONSSON, A ;

AHRENBERG, L .

KNOWLEDGE-BASED SYSTEMS, 1993, 6 (04) :258-266

[10]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

← 1 2 3 →