Effects of Dialectal Code-Switching on Speech Modules: A Study using Egyptian Arabic Broadcast Speech

被引:6
|
作者
Chowdhury, Shammur A. [1 ]
Samih, Younes [1 ]
Eldesouki, Mohamed [2 ]
Ali, Ahmed [1 ]
机构
[1] HBKU, Qatar Comp Res Inst, Doha, Qatar
[2] Concordia Univ, Montreal, PQ, Canada
来源
INTERSPEECH 2020 | 2020年
关键词
code-switching; dialect identification; corpus; code mixing index;
D O I
10.21437/Interspeech.2020-2271
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
The intra-utterance code-switching (CS) is defined as the alternation between two or more languages within the same utterance. Despite the fact that spoken dialectal code-switching (DCS) is more challenging than CS, it remains largely unexplored. In this study, we describe a method to build the first spoken DCS corpus. The corpus is annotated at the token-level minding both linguistic and acoustic cues for dialectal Arabic. For detailed analysis, we study Arabic automatic speech recognition (ASR), Arabic dialect identification (ADI), and natural language processing (NLP) modules for the DCS corpus. Our results highlight the importance of lexical information for discriminating the DCS labels. We observe that the performance of different models is highly dependent on the degree of code-mixing at the token-level as well as its complexity at the utterance-level.
引用
收藏
页码:2382 / 2386
页数:5
相关论文
共 50 条
  • [31] Abriendo closings in bilingual radio speech: Discourse strategies, code-switching, and the interactive construction of broadcast structures and institutional identity
    Tseng, Amelia
    TEXT & TALK, 2018, 38 (04) : 481 - 502
  • [32] ArzEn: A Speech Corpus for Code-switched Egyptian Arabic-English
    Hamed, Injy
    Ngoc Thang Vu
    Abdennadher, Slim
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4237 - 4246
  • [33] AN EVALUATION BENCHMARK FOR AUTOMATIC SPEECH RECOGNITION OF GERMAN-ENGLISH CODE-SWITCHING
    Khosravani, Abbas
    Garner, Philip N.
    Lazaridis, Alexandros
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 811 - 816
  • [34] EXPLOITING PARTS-OF-SPEECH FOR IMPROVED TEXTUAL MODELING OF CODE-SWITCHING DATA
    Sreeram, Ganji
    Sinha, Rohit
    2018 TWENTY FOURTH NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2018,
  • [35] A SOCIO-PRAGMATIC ANALYSIS OF CODE-SWITCHING IN THE LOGOLI SPEECH COMMUNITY OF KANGEMI
    Gimode, Jescah
    Barnes, Lawrie
    LANGUAGE MATTERS, 2015, 46 (02) : 249 - 274
  • [36] CodeFed: Federated Speech Recognition for Low-Resource Code-Switching Detection
    Madan, Chetan
    Diddee, Harshita
    Kumar, Deepika
    Mittal, Mamta
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (01)
  • [37] CanVEC - the Canberra Vietnamese-English Code-switching Natural Speech Corpus
    Li Nguyen
    Bryant, Christopher
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4121 - 4129
  • [38] NON-AUTOREGRESSIVE MANDARIN-ENGLISH CODE-SWITCHING SPEECH RECOGNITION
    Chuang, Shun-Po
    Chang, Heng-Jui
    Huang, Sung-Feng
    Lee, Hung-yi
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 465 - 472
  • [39] Acoustic data augmentation for Mandarin-English code-switching speech recognition
    Long, Yanhua
    Li, Yijie
    Zhang, Qiaozheng
    Wei, Shuang
    Ye, Hong
    Yang, Jichen
    APPLIED ACOUSTICS, 2020, 161