Adapting Code-Switching Language Models with Statistical-Based Text Augmentation

被引:0
|
作者
Prachaseree, Chaiyasait [1 ]
Gupta, Kshitij [2 ]
Thi Nga Ho [1 ]
Peng, Yizhou [2 ]
Tun, Kyaw Zin [1 ]
Chng, Eng Siong [1 ]
Chalapthi, G. S. S. [2 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] Natl Univ Singapore, Singapore, Singapore
关键词
Code-switching; Language Modeling; Data Augmentation;
D O I
10.1007/978-981-99-5837-5_26
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces a statistical augmentation approach to generate code-switched sentences for code-switched language modeling. The proposed technique converts monolingual sentences from a particular domain into their corresponding code-switched versions using pretrained monolingual Part-of-Speech tagging models. The work also showed that adding 150 handcrafted formal to informal word replacements can further improve the naturalness of augmented sentences. When tested on an English-Malay code-switching corpus, a relative decrease of 9.7% in perplexity for ngram language model interpolated with the language model trained with augmented texts and other monolingual texts was observed, and 5.9% perplexity reduction for RNNLMs.
引用
收藏
页码:310 / 322
页数:13
相关论文
共 50 条
  • [21] TRAINING CODE-SWITCHING LANGUAGE MODEL WITH MONOLINGUAL DATA
    Chuang, Shun-Po
    Sung, Tzu-Wei
    Lee, Hung-yi
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7949 - 7953
  • [22] CODE-SWITCHING AND LANGUAGE DOMINANCE - SOME INITIAL FINDINGS
    VALDESFALLIS, G
    GENERAL LINGUISTICS, 1978, 18 (02): : 90 - 104
  • [23] The Use of Code-switching in English Foreign Language Classroom
    徐婷
    海外英语, 2018, (16) : 228 - 230
  • [24] EMBARRASSMENT AND CODE-SWITCHING INTO A 2ND LANGUAGE
    BOND, MH
    LAI, TM
    JOURNAL OF SOCIAL PSYCHOLOGY, 1986, 126 (02): : 179 - 186
  • [25] Teachers' Use of Code-Switching in Foreign Language Classroom
    Sulaiman, Alaa Alshaikh
    PROCEEDINGS OF THE 11TH INNOVATION IN LANGUAGE LEARNING INTERNATIONAL CONFERENCE, 2018, : 284 - 287
  • [26] The Impact of Language Code-Switching on Ad Claim Recall
    Bishop, Melissa Maier
    Peterson, Mark
    ADVANCES IN CONSUMER RESEARCH, VOL 35, 2008, 35 : 831 - 832
  • [27] Code-switching and the optimal grammar of bilingual language use
    Bhatt, Rakesh M.
    Bolonyai, Agnes
    BILINGUALISM-LANGUAGE AND COGNITION, 2011, 14 (04) : 522 - 546
  • [28] Code-switching in bilingual children with specific language impairment
    Gutierrez-Clellen, Vera F.
    Simon-Cereijido, Gabriela
    Leone, Angela Erickson
    INTERNATIONAL JOURNAL OF BILINGUALISM, 2009, 13 (01) : 91 - 109
  • [29] Code-Switching by Spanish-English Bilingual Children in a Code-Switching Conversation Sample: Roles of Language Proficiency, Interlocutor Behavior, and Parent-Reported Code-Switching Experience
    Gross, Megan C.
    Gonzalez, Ada C. Lopez
    Girardin, Maria G.
    Almeida, Adriana M.
    LANGUAGES, 2022, 7 (04)
  • [30] Subjectivity Analysis of an Enhanced Feature Set for Code-Switching Text
    Kasmuri, Emaliana
    Basiron, Halizah
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (09) : 450 - 460