Approaches for Multilingual Phone Recognition in Code-switched and Non-code-switched Scenarios Using Indian Languages

被引：0

作者：

Manjunath, K. E. ^{[1
]}

Raghavan, Srinivasa K. M. ^{[2
]}

Rao, K. Sreenivasa ^{[3
]}

Jayagopi, Dinesh Babu ^{[2
]}

Ramasubramanian, V ^{[2
]}

机构：

[1] ISRO, Int Inst Informat Technol Bangalore, UR Rao Satellite Ctr, HAL Airport Rd, Bangalore 560017, Karnataka, India

[2] Int Inst Informat Technol, Elect City Phase 1, Bangalore 560100, Karnataka, India

[3] Indian Inst Technol Kharagpur, Kharagpur 721301, W Bengal, India

来源：

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING | 2021年 / 20卷 / 04期

关键词：

Indian languages; multilingual phone recognition; LID-switched monolingual PRS; code-switching; common multilingual phone-set; SPEAKER; SYSTEM; ASR;

D O I：

10.1145/3437256

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this study, we evaluate and compare two different approaches for multilingual phone recognition in code-switched and non-code-switched scenarios. First approach is a front-end Language Identification (LID)switched to a monolingual phone recognizer (LID-Mono), trained individually on each of the languages present in multilingual dataset. In the second approach, a common multilingual phone-set derived from the International Phonetic Alphabet (IPA) transcription of the multilingual dataset is used to develop a Multilingual Phone Recognition System (Multi-PRS). The bilingual code-switching experiments are conducted using Kannada and Urdu languages. In the first approach, LID is performed using the state-of-the-art i-vectors. Both monolingual and multilingual phone recognition systems are trained using Deep Neural Networks. The performance of LID-Mono and Multi-PRS approaches are compared and analysed in detail. It is found that the performance of Multi-PRS approach is superior compared to more conventional LID-Mono approach in both code-switched and non-code-switched scenarios. For code-switched speech, the effect of length of segments (that are used to perform LID) on the performance of LID-Mono system is studied by varying the window size from 500 ms to 5.0 s, and full utterance. The LID-Mono approach heavily depends on the accuracy of the LID system and the LID errors cannot be recovered. But, the Multi-PRS system by virtue of not having to do a front-end LID switching and designed based on the common multilingual phone-set derived from several languages, is not constrained by the accuracy of the LID system, and hence performs effectively on code-switched and non-code-switched speech, offering low Phone Error Rates than the LID-Mono system.

引用

页数：19

共 61 条

[1] [Anonymous], 2012, DEV PROSODICALLY GUI
[2] [Anonymous], 2016, PROC INT WORKSHOP SP
[3] [Anonymous], 2007, HDB INT PHONETIC ASS
[4] [Anonymous], 2012, American Journal of Signal Processing, DOI DOI 10.5923/J.AJSP.20120205.02
[5] [Anonymous], 2005, P C INT SPEECH COMM
[6] Semi-supervised acoustic model training for five-lingual code-switched ASR
Biswas, Astik
Yilmaz, Emre
de Wet, Febe
van der Westhuizen, Ewald
Niesler, Thomas
[J]. INTERSPEECH 2019, 2019, : 3745 - 3749
[7] Campbell W.M., 2004, ODYSSEY04, P41
[8] Support vector machines for speaker and language recognition
Campbell, WM
Campbell, JP
Reynolds, DA
Singer, E
Torres-Carrasquillo, PA
[J]. COMPUTER SPEECH AND LANGUAGE, 2006, 20 (2-3) : 210 - 229
[9] Chang C.-C., 2011, ACM T INTEL SYST TEC, V2, P1, DOI DOI 10.1145/1961189.1961199
[10] Dehak N, 2011, 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, P864

← 1 2 3 4 5 6 7 →