Adapting Code-Switching Language Models with Statistical-Based Text Augmentation

被引:0
|
作者
Prachaseree, Chaiyasait [1 ]
Gupta, Kshitij [2 ]
Thi Nga Ho [1 ]
Peng, Yizhou [2 ]
Tun, Kyaw Zin [1 ]
Chng, Eng Siong [1 ]
Chalapthi, G. S. S. [2 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] Natl Univ Singapore, Singapore, Singapore
关键词
Code-switching; Language Modeling; Data Augmentation;
D O I
10.1007/978-981-99-5837-5_26
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces a statistical augmentation approach to generate code-switched sentences for code-switched language modeling. The proposed technique converts monolingual sentences from a particular domain into their corresponding code-switched versions using pretrained monolingual Part-of-Speech tagging models. The work also showed that adding 150 handcrafted formal to informal word replacements can further improve the naturalness of augmented sentences. When tested on an English-Malay code-switching corpus, a relative decrease of 9.7% in perplexity for ngram language model interpolated with the language model trained with augmented texts and other monolingual texts was observed, and 5.9% perplexity reduction for RNNLMs.
引用
收藏
页码:310 / 322
页数:13
相关论文
共 50 条
  • [41] Pronunciation augmentation for Mandarin-English code-switching speech recognition
    Long, Yanhua
    Wei, Shuang
    Lian, Jie
    Li, Yijie
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
  • [42] Acoustic and Textual Data Augmentation for Improved ASR of Code-Switching Speech
    Yilmaz, Emre
    van den Heuvel, Henk
    van Leeuwen, David A.
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1933 - 1937
  • [43] Pronunciation augmentation for Mandarin-English code-switching speech recognition
    Yanhua Long
    Shuang Wei
    Jie Lian
    Yijie Li
    EURASIP Journal on Audio, Speech, and Music Processing, 2021
  • [44] Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation
    Liang, Zheng
    Song, Zheshu
    Ma, Ziyang
    Du, Chenpeng
    Yu, Kai
    Chen, Xie
    INTERSPEECH 2023, 2023, : 919 - 923
  • [45] Combining Recurrent Neural Networks and Factored Language Models During Decoding of Code-Switching Speech
    Adel, Heike
    Telaar, Dominic
    Ngoc Thang Vu
    Kirchhoff, Katrin
    Schultz, Tanja
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1415 - 1419
  • [46] Code-switching and code-mixing in bilingual communication: Language deficiency or creativity?
    Nugraheni, D. A.
    ELT IN ASIA IN THE DIGITAL ERA: GLOBAL CITIZENSHIP AND IDENTITY, 2018, : 401 - 407
  • [47] Which Mix - code-switching or a mixed language? - Gurindji Kriol
    Meakins, Felicity
    JOURNAL OF PIDGIN AND CREOLE LANGUAGES, 2012, 27 (01) : 105 - 140
  • [48] Virtual Versus Physical Code-Switching in English Language Classrooms
    Jehma, Hambalee
    INTERNATIONAL JOURNAL OF EMERGING TECHNOLOGIES IN LEARNING, 2022, 17 (21): : 261 - 274
  • [49] Code-switching in the narratives of dual-language Latino preschoolers
    Halpin, Emily
    Melzi, Gigliana
    INTERNATIONAL JOURNAL OF BILINGUAL EDUCATION AND BILINGUALISM, 2021, 24 (09) : 1271 - 1287
  • [50] Virtual Versus Physical Code-Switching in English Language Classrooms
    Prince of Songkla University International College Hatyai Campus, Prince of Songkla University, Hat Yai,Songkhla, Thailand
    Int. J. Emerg. Technol. Learn., 1868, 21 (261-274):