Adapting Code-Switching Language Models with Statistical-Based Text Augmentation

被引:0
|
作者
Prachaseree, Chaiyasait [1 ]
Gupta, Kshitij [2 ]
Thi Nga Ho [1 ]
Peng, Yizhou [2 ]
Tun, Kyaw Zin [1 ]
Chng, Eng Siong [1 ]
Chalapthi, G. S. S. [2 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] Natl Univ Singapore, Singapore, Singapore
关键词
Code-switching; Language Modeling; Data Augmentation;
D O I
10.1007/978-981-99-5837-5_26
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces a statistical augmentation approach to generate code-switched sentences for code-switched language modeling. The proposed technique converts monolingual sentences from a particular domain into their corresponding code-switched versions using pretrained monolingual Part-of-Speech tagging models. The work also showed that adding 150 handcrafted formal to informal word replacements can further improve the naturalness of augmented sentences. When tested on an English-Malay code-switching corpus, a relative decrease of 9.7% in perplexity for ngram language model interpolated with the language model trained with augmented texts and other monolingual texts was observed, and 5.9% perplexity reduction for RNNLMs.
引用
收藏
页码:310 / 322
页数:13
相关论文
共 50 条
  • [31] Bilingual literary creativity: language shift or code-switching?
    Georgy, Khukhuni T.
    Irina, Valuitseva I.
    FILOLOGICHESKIE NAUKI-NAUCHNYE DOKLADY VYSSHEI SHKOLY-PHILOLOGICAL SCIENCES-SCIENTIFIC ESSAYS OF HIGHER EDUCATION, 2021, (06): : 227 - 233
  • [32] Youth language in Beasain: the influence of the context on code-switching
    Iparragirre, Maddi Aiestaran
    EUSKERA, 2024, 69 (01):
  • [33] Code-switching in conversation: Language, interaction and identity.
    Backus, A
    JOURNAL OF PRAGMATICS, 2000, 32 (06) : 831 - 838
  • [34] Code-Mixing and Code-Switching on Social Media Text: A Brief Survey
    Mangla, Ankur
    Bansal, Rakesh Kumar
    Bansal, Savina
    Proceedings of the 2023 IEEE International Conference on Computer Vision and Machine Intelligence, CVMI 2023, 2023,
  • [35] Ensemble of Binary Classification for the Emotion Detection in Code-Switching Text
    Zhang, Xinghua
    Zhang, Chunyue
    Shi, Huaxing
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2018, PT II, 2018, 11109 : 178 - 189
  • [36] Language Control and Code-Switching in Bilingual Children With Developmental Language Disorder
    Gross, Megan C.
    Kaushanskaya, Margarita
    JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2022, 65 (03): : 1104 - 1127
  • [37] The Influence Of Malay Language And English Language In French Code-switching Strategies
    Halim, Hazlina Abdul
    GEMA ONLINE JOURNAL OF LANGUAGE STUDIES, 2012, 12 (02): : 693 - 709
  • [38] TEACHER'S CODE-SWITCHING IN FIRST LANGUAGE IN ENGLISH LANGUAGE CLASSES
    Naka, Laura
    PSYCHOLOGY AND PSYCHIATRY, SOCIOLOGY AND HEALTHCARE, EDUCATION, VOL III, 2014, : 847 - 854
  • [39] Improving code-switching speech recognition with data augmentation and system combination
    Ma, Duo
    Xu, Haihua
    Li, Guanyu
    Chng, Eng Siong
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1308 - 1312
  • [40] DATA AUGMENTATION FOR END-TO-END CODE-SWITCHING SPEECH RECOGNITION
    Du, Chenpeng
    Li, Hao
    Lu, Yizhou
    Wang, Lan
    Qian, Yanmin
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 194 - 200