Learning Adapters for Code-Switching Speech Recognition

被引：1

作者：

He, Chun-Yi ^{[1
]}

Chien, Jen-Tzung ^{[1
]}

机构：

[1] Natl Yang Ming Chiao Tung Univ, Inst Elect & Comp Engn, Hsinchu, Taiwan

来源：

2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC | 2023年

关键词：

LANGUAGE IDENTIFICATION;

D O I：

10.1109/APSIPAASC58517.2023.10317410

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multilingual code-switching speech recognition has been an emerging research direction in real-world applications since most of speakers are bilingual or multilingual. A code-switching sentence is the mixing of two or more languages especially within the same sentence. It is crucial to work out a multilingual speech recognition via a code-switching scheme based on a parameter-efficient learning by utilizing a pre-trained encoder. Using this scheme, it is essential to identify the languages within a single spoken utterance for code-switching speech recognition. However, collecting speech data from mono-lingual language is easier than collecting code-switching speech in multiple languages. This study develops a new code-switching Mandarin-English speech recognition by utilizing a large-scale pre-trained backbone model containing 53 single-code languages. The fine-tuning of backbone model is implemented by introducing the controllable language or task adapters and incorporating small number of Mandarin-English code-switching speech where the backbone model is frozen and only individual adapters for Mandarin and English are estimated. A limited amount of controllable parameters can be sufficiently calculated. Experiments on code-switching speech recognition for Taiwanese Mandarin and English shows the merit of the proposed method.

引用

页码：344 / 349

页数：6

共 47 条

[1] [Anonymous], 2015, Bayesian Speech and Language Processing
[2] Ardila R, 2020, PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), P4218
[3] Auli Michael, 2020, P ANN C INT SPEECH C
[4] Baevski A., 2020, Advances in Neural Information Processing Systems
[5] Bapna A, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P1538
[6] Cai WC, 2019, INT CONF ACOUST SPEE, P5991, DOI [10.1109/ICASSP.2019.8682386, 10.1109/icassp.2019.8682386]
[7] Chan W, 2016, INT CONF ACOUST SPEE, P4960, DOI 10.1109/ICASSP.2016.7472621
[8] Bayesian Transformer Using Disentangled Mask Attention
Chien, Jen-Tzung
Huang, Yu-Han
[J]. INTERSPEECH 2022, 2022, : 1761 - 1765
[9] Chien JT, 2022, ASIAPAC SIGN INFO PR, P538, DOI 10.23919/APSIPAASC55919.2022.9979949
[10] Hierarchical and Self-Attended Sequence Autoencoder
Chien, Jen-Tzung
Wang, Chun-Wei
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (09) : 4975 - 4986

← 1 2 3 4 5 →