DRAFT: A Novel Framework to Reduce Domain Shifting in Self-supervised Learning and Its Application to Children's ASR

被引：11

作者：

Fan, Ruchao ^{[1
]}

Alwan, Abeer ^{[1
]}

机构：

[1] Univ Calif Los Angeles, Dept Elect & Comp Engn, Los Angeles, CA 90095 USA

来源：

INTERSPEECH 2022 | 2022年

关键词：

self-supervised learning; domain adaptation; children's ASR; end-to-end speech recognition;

D O I：

10.21437/Interspeech.2022-11128

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Self-supervised learning (SSL) in the pretraining stage using un-annotated speech data has been successful in low-resource automatic speech recognition (ASR) tasks. However, models trained through SSL are biased to the pretraining data which is usually different from the data used in finetuning tasks, causing a domain shifting problem, and thus resulting in limited knowledge transfer. We propose a novel framework, domain responsible adaptation and finetuning (DRAFT), to reduce domain shifting in pretrained speech models through an additional adaptation stage. In DRAFT, residual adapters (RAs) are inserted in the pretrained model to learn domain-related information with the same SSL loss as the pretraining stage. Only RA parameters are updated during the adaptation stage. DRAFT is agnostic to the type of SSL method used and is evaluated with three widely used approaches: APC, Wav2vec2.0, and HuBERT. On two child ASR tasks (OGI and MyST databases), using SSL models trained with un-annotated adult speech data (Librispeech), relative WER improvements of up to 19.7% are observed when compared to the pretrained models without adaptation. Additional experiments examined the potential of cross knowledge transfer between the two datasets and the results are promising, showing a broader usage of the proposed DRAFT framework.

引用

页码：4900 / 4904

页数：5

共 47 条

[11] BI-APC: BIDIRECTIONAL AUTOREGRESSIVE PREDICTIVE CODING FOR UNSUPERVISED PRE-TRAINING AND ITS APPLICATION TO CHILDREN'S ASR
Fan, Ruchao
Afshan, Amber
Alwan, Abeer
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7023 - 7027
[12] Catastrophic forgetting in connectionist networks
French, RM
[J]. TRENDS IN COGNITIVE SCIENCES, 1999, 3 (04) : 128 - 135
[13] Hou W., 2021, IEEE ACM TASLP
[14] Houlsby N, 2019, PR MACH LEARN RES, V97
[15] Hsu W.-N., 2021, P INT 2021
[16] HUBERT: HOW MUCH CAN A BAD TEACHER BENEFIT ASR PRE-TRAINING?
Hsu, Wei-Ning
Tsai, Yao-Hung Hubert
Bolte, Benjamin
Salakhutdinov, Ruslan
Mohamed, Abdelrahman
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6533 - 6537
[17] HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Hsu, Wei-Ning
Bolte, Benjamin
Tsai, Yao-Hung Hubert
Lakhotia, Kushal
Salakhutdinov, Ruslan
Mohamed, Abdelrahman
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3451 - 3460
[18] Hwang D., 2022, IEEE ICASSP
[19] A FURTHER STUDY OF UNSUPERVISED PRETRAINING FOR TRANSFORMER BASED SPEECH RECOGNITION
Jiang, Dongwei
Li, Wubo
Zhang, Ruixiong
Cao, Miao
Luo, Ne
Han, Yang
Zou, Wei
Han, Kun
Li, Xiangang
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6538 - 6542
[20] Kannan A., 2019, P INT 2019

← 1 2 3 4 5 →