Adapting Code-Switching Language Models with Statistical-Based Text Augmentation

被引:0
|
作者
Prachaseree, Chaiyasait [1 ]
Gupta, Kshitij [2 ]
Thi Nga Ho [1 ]
Peng, Yizhou [2 ]
Tun, Kyaw Zin [1 ]
Chng, Eng Siong [1 ]
Chalapthi, G. S. S. [2 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] Natl Univ Singapore, Singapore, Singapore
关键词
Code-switching; Language Modeling; Data Augmentation;
D O I
10.1007/978-981-99-5837-5_26
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces a statistical augmentation approach to generate code-switched sentences for code-switched language modeling. The proposed technique converts monolingual sentences from a particular domain into their corresponding code-switched versions using pretrained monolingual Part-of-Speech tagging models. The work also showed that adding 150 handcrafted formal to informal word replacements can further improve the naturalness of augmented sentences. When tested on an English-Malay code-switching corpus, a relative decrease of 9.7% in perplexity for ngram language model interpolated with the language model trained with augmented texts and other monolingual texts was observed, and 5.9% perplexity reduction for RNNLMs.
引用
收藏
页码:310 / 322
页数:13
相关论文
共 50 条
  • [1] Code-switching and language control
    Green, David W.
    Wei, Li
    BILINGUALISM-LANGUAGE AND COGNITION, 2016, 19 (05) : 883 - 884
  • [2] Syntactic and Semantic Features For Code-Switching Factored Language Models
    Adel, Heike
    Ngoc Thang Vu
    Kirchhoff, Katrin
    Telaar, Dominic
    Schultz, Tanja
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (03) : 431 - 440
  • [3] Code-switching and minority language attrition
    Toribio, AJ
    SPANISH APPLIED LINGUISTICS AT THE TURN OF THE MILLENNIUM, 2000, : 174 - 193
  • [4] Lattice-based Data Augmentation for Code-switching Speech Recognition
    Hartanto, Roland
    Uto, Kuniaki
    Shinoda, Koichi
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1667 - 1672
  • [5] Language Code-Switching Detection Based on BERT-LID
    Nie, Yuting
    Zhang, WeiQiang
    Ji, Zhe
    Shi, GuiXin
    2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 36 - 40
  • [6] Should code-switching models be asymmetric?
    Bullock, Barbara E.
    Guzman, Gualberto
    Serigos, Jacqueline
    Toribio, Almeida Jacqueline
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2534 - 2538
  • [7] Code-switching and its role in language socialization
    Wang, Wenxia
    INTERNATIONAL JOURNAL OF BILINGUAL EDUCATION AND BILINGUALISM, 2019, 22 (07) : 787 - 800
  • [8] PATTERNS OF CODE-SWITCHING AND PATTERNS OF LANGUAGE CONTACT
    BENTAHILA, A
    DAVIES, EE
    LINGUA, 1995, 96 (2-3) : 75 - 93
  • [9] Predicting the presence of a Matrix Language in code-switching
    Bullock, Barbara E.
    Guzman, Wally
    Serigos, Jacqueline
    Sharath, Vivek
    Toribio, Almeida Jacqueline
    COMPUTATIONAL APPROACHES TO LINGUISTIC CODE-SWITCHING, 2018, : 68 - 75
  • [10] Complementing in another language Prosody and code-switching
    Steuck, Jonathan
    Cacoullos, Rena Torres
    LANGUAGE VARIATION - EUROPEAN PERSPECTIVES VII, 2019, 22 : 217 - 229