Learning Adapters for Code-Switching Speech Recognition

被引:1
作者
He, Chun-Yi [1 ]
Chien, Jen-Tzung [1 ]
机构
[1] Natl Yang Ming Chiao Tung Univ, Inst Elect & Comp Engn, Hsinchu, Taiwan
来源
2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC | 2023年
关键词
LANGUAGE IDENTIFICATION;
D O I
10.1109/APSIPAASC58517.2023.10317410
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multilingual code-switching speech recognition has been an emerging research direction in real-world applications since most of speakers are bilingual or multilingual. A code-switching sentence is the mixing of two or more languages especially within the same sentence. It is crucial to work out a multilingual speech recognition via a code-switching scheme based on a parameter-efficient learning by utilizing a pre-trained encoder. Using this scheme, it is essential to identify the languages within a single spoken utterance for code-switching speech recognition. However, collecting speech data from mono-lingual language is easier than collecting code-switching speech in multiple languages. This study develops a new code-switching Mandarin-English speech recognition by utilizing a large-scale pre-trained backbone model containing 53 single-code languages. The fine-tuning of backbone model is implemented by introducing the controllable language or task adapters and incorporating small number of Mandarin-English code-switching speech where the backbone model is frozen and only individual adapters for Mandarin and English are estimated. A limited amount of controllable parameters can be sufficiently calculated. Experiments on code-switching speech recognition for Taiwanese Mandarin and English shows the merit of the proposed method.
引用
收藏
页码:344 / 349
页数:6
相关论文
共 47 条
  • [1] [Anonymous], 2015, Bayesian Speech and Language Processing
  • [2] Ardila R, 2020, PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), P4218
  • [3] Auli Michael, 2020, P ANN C INT SPEECH C
  • [4] Baevski A., 2020, Advances in Neural Information Processing Systems
  • [5] Bapna A, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P1538
  • [6] Cai WC, 2019, INT CONF ACOUST SPEE, P5991, DOI [10.1109/ICASSP.2019.8682386, 10.1109/icassp.2019.8682386]
  • [7] Chan W, 2016, INT CONF ACOUST SPEE, P4960, DOI 10.1109/ICASSP.2016.7472621
  • [8] Bayesian Transformer Using Disentangled Mask Attention
    Chien, Jen-Tzung
    Huang, Yu-Han
    [J]. INTERSPEECH 2022, 2022, : 1761 - 1765
  • [9] Chien JT, 2022, ASIAPAC SIGN INFO PR, P538, DOI 10.23919/APSIPAASC55919.2022.9979949
  • [10] Hierarchical and Self-Attended Sequence Autoencoder
    Chien, Jen-Tzung
    Wang, Chun-Wei
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (09) : 4975 - 4986