Approaches for Multilingual Phone Recognition in Code-switched and Non-code-switched Scenarios Using Indian Languages

被引:0
作者
Manjunath, K. E. [1 ]
Raghavan, Srinivasa K. M. [2 ]
Rao, K. Sreenivasa [3 ]
Jayagopi, Dinesh Babu [2 ]
Ramasubramanian, V [2 ]
机构
[1] ISRO, Int Inst Informat Technol Bangalore, UR Rao Satellite Ctr, HAL Airport Rd, Bangalore 560017, Karnataka, India
[2] Int Inst Informat Technol, Elect City Phase 1, Bangalore 560100, Karnataka, India
[3] Indian Inst Technol Kharagpur, Kharagpur 721301, W Bengal, India
关键词
Indian languages; multilingual phone recognition; LID-switched monolingual PRS; code-switching; common multilingual phone-set; SPEAKER; SYSTEM; ASR;
D O I
10.1145/3437256
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this study, we evaluate and compare two different approaches for multilingual phone recognition in code-switched and non-code-switched scenarios. First approach is a front-end Language Identification (LID)switched to a monolingual phone recognizer (LID-Mono), trained individually on each of the languages present in multilingual dataset. In the second approach, a common multilingual phone-set derived from the International Phonetic Alphabet (IPA) transcription of the multilingual dataset is used to develop a Multilingual Phone Recognition System (Multi-PRS). The bilingual code-switching experiments are conducted using Kannada and Urdu languages. In the first approach, LID is performed using the state-of-the-art i-vectors. Both monolingual and multilingual phone recognition systems are trained using Deep Neural Networks. The performance of LID-Mono and Multi-PRS approaches are compared and analysed in detail. It is found that the performance of Multi-PRS approach is superior compared to more conventional LID-Mono approach in both code-switched and non-code-switched scenarios. For code-switched speech, the effect of length of segments (that are used to perform LID) on the performance of LID-Mono system is studied by varying the window size from 500 ms to 5.0 s, and full utterance. The LID-Mono approach heavily depends on the accuracy of the LID system and the LID errors cannot be recovered. But, the Multi-PRS system by virtue of not having to do a front-end LID switching and designed based on the common multilingual phone-set derived from several languages, is not constrained by the accuracy of the LID system, and hence performs effectively on code-switched and non-code-switched speech, offering low Phone Error Rates than the LID-Mono system.
引用
收藏
页数:19
相关论文
共 61 条
  • [1] [Anonymous], 2012, DEV PROSODICALLY GUI
  • [2] [Anonymous], 2016, PROC INT WORKSHOP SP
  • [3] [Anonymous], 2007, HDB INT PHONETIC ASS
  • [4] [Anonymous], 2012, American Journal of Signal Processing, DOI DOI 10.5923/J.AJSP.20120205.02
  • [5] [Anonymous], 2005, P C INT SPEECH COMM
  • [6] Semi-supervised acoustic model training for five-lingual code-switched ASR
    Biswas, Astik
    Yilmaz, Emre
    de Wet, Febe
    van der Westhuizen, Ewald
    Niesler, Thomas
    [J]. INTERSPEECH 2019, 2019, : 3745 - 3749
  • [7] Campbell W.M., 2004, ODYSSEY04, P41
  • [8] Support vector machines for speaker and language recognition
    Campbell, WM
    Campbell, JP
    Reynolds, DA
    Singer, E
    Torres-Carrasquillo, PA
    [J]. COMPUTER SPEECH AND LANGUAGE, 2006, 20 (2-3) : 210 - 229
  • [9] Chang C.-C., 2011, ACM T INTEL SYST TEC, V2, P1, DOI DOI 10.1145/1961189.1961199
  • [10] Dehak N, 2011, 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, P864