DUAL LEARNING FOR LARGE VOCABULARY ON-DEVICE ASR

被引：1

作者：

Peyser, Cal ^{[1
,2
]}

Huang, Ronny ^{[2
]}

Sainath, Tara ^{[2
]}

Prabhavalkar, Rohit ^{[2
]}

Picheny, Michael ^{[1
]}

Cho, Kyunghyun ^{[1
]}

机构：

[1] NYU, Ctr Data Sci, New York, NY 10012 USA

[2] Google Inc, Menlo Pk, CA USA

来源：

2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT | 2022年

关键词：

SPEECH;

D O I：

10.1109/SLT54892.2023.10023407

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Dual learning is a paradigm for semi-supervised machine learning that seeks to leverage unsupervised data by solving two opposite tasks at once. In this scheme, each model is used to generate pseudo-labels for unlabeled examples that are used to train the other model. Dual learning has seen some use in speech processing by pairing ASR and TTS as dual tasks. However, these results mostly address only the case of using unpaired examples to compensate for very small supervised datasets, and mostly on large, non-streaming models. Dual learning has not yet been proven effective for using unsupervised data to improve realistic on-device streaming models that are already trained on large supervised corpora. We provide this missing piece though an analysis of an on-device-sized streaming conformer trained on the entirety of Librispeech, showing relative WER improvements of 10.7%/5.2% without an LM and 11.7%/16.4% with an LM.

引用

页码：245 / 251

页数：7

共 37 条

[1]

Baevski A, 2020, vq-wav2vec: Self-supervised learning of discrete speech representations

[2]

Baevski A., 2020, Advances in neural information processing systems

[3]

Baevski Alexei, 2020, IEEE INT C ACOUSTICS

[4]

Bengio S, 2015, ADV NEUR IN, V28

[5] Tied & Reduced RNN-T Decoder [J].

Botros, Rami ;

Sainath, Tara N. ;

David, Robert ;

Guzman, Emmanuel ;

Li, Wei ;

He, Yanzhang .

INTERSPEECH 2021, 2021, :4563-4567

[6] WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing [J].

Chen, Sanyuan ;

Wang, Chengyi ;

Chen, Zhengyang ;

Wu, Yu ;

Liu, Shujie ;

Chen, Zhuo ;

Li, Jinyu ;

Kanda, Naoyuki ;

Yoshioka, Takuya ;

Xiao, Xiong ;

Wu, Jian ;

Zhou, Long ;

Ren, Shuo ;

Qian, Yanmin ;

Qian, Yao ;

Zeng, Michael ;

Yu, Xiangzhan ;

Wei, Furu .

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) :1505-1518

[7] INJECTING TEXT IN SELF-SUPERVISED SPEECH PRETRAINING [J].

Chen, Zhehuai ;

Zhang, Yu ;

Rosenberg, Andrew ;

Ramabhadran, Bhuvana ;

Wang, Gary ;

Moreno, Pedro .

2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, :251-258

[8]

Chen Zhehuai, 2022, IEEE INT C ACOUSTICS

[9] Unsupervised Speech Representation Learning Using WaveNet Autoencoders [J].

Chorowski, Jan ;

Weiss, Ron J. ;

Bengio, Samy ;

van den Oord, Aaron .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (12) :2041-2053

[10] Vector-Quantized Autoregressive Predictive Coding [J].

Chung, Yu-An ;

Tang, Hao ;

Glass, James .

INTERSPEECH 2020, 2020, :3760-3764

← 1 2 3 4 →