Acoustic Feature Analysis and Discriminative Modeling for Language Identification of Closely Related South-Asian Languages

被引:10
作者
Adeeba, Farah [1 ]
Hussain, Sarmad [1 ]
机构
[1] Univ Engn & Technol, Al Khawarizmi Inst Comp Sci KICS, CLE, Lahore, Pakistan
关键词
Speech corpus; Language identification; Urdu; Sindhi; Pashto; Punjabi; PROSODIC FEATURES; RECOGNITION; SPEECH;
D O I
10.1007/s00034-017-0724-1
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
With the advancement in technology, communication between people around the world from different linguistic backgrounds is increasing gradually, resulting in the requirement of language identification services. Language identification techniques extract distinguishable information as features of a language from the speech corpora to differentiate one language from other. Without publicly available speech corpora, comparison between different techniques will not be much reliable. This paper investigates state-of-the-art features and techniques for language identification of under-resource and closely related languages, namely Pashto, Punjabi, Sindhi, and Urdu. For language identification, speech corpus is designed and collected for mentioned languages. The dataset is a read speech data collected over telephone network (mobile and landline) from different regions of Pakistan. The speech corpus is annotated at the sentence level using X-SAMPA, its orthographic transcription is also provided, and verified data are divided into training and evaluation sets. Mel-frequency cepstral coefficients and their shifted delta cepstral features are used to develop language identification system of target languages. Gaussian mixture model with universal background model (GMM-UBM)-based and I-vector-based language identification approaches are investigated. The results show that GMM-UBM is more effective than the I-vector for language identification of short duration test utterances.
引用
收藏
页码:3589 / 3604
页数:16
相关论文
共 52 条
[1]  
Adeeba F., 2016, THEOR COCOSDA BAL IN
[2]  
Adeeba F., 2014, C LANG TECHN KAR PAK
[3]  
Al-Ali A. K. H., 2017, ENHANCED FORENSIC SP, P15400
[4]  
[Anonymous], 2014, ODYSSEY SPEAK LANG L
[5]  
[Anonymous], 1998, 1998 CENSUS REPORT P
[6]   Non-Negative Factor Analysis of Gaussian Mixture Model Weight Adaptation for Language and Dialect Recognition [J].
Bahari, Mohamad Hasan ;
Dehak, Najim ;
Van Hamme, Hugo ;
Burget, Lukas ;
Ali, Ahmed M. ;
Glass, Jim .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (07) :1117-1129
[7]   Factors affecting i-vector based foreign accent recognition: A case study in spoken Finnish [J].
Behravan, Hamid ;
Hautamaki, Ville ;
Kinnunen, Tomi .
SPEECH COMMUNICATION, 2015, 66 :118-129
[8]  
Bertoldi N, 2003, LECT NOTES COMPUT SC, V2785, P476
[9]  
Boersma P., 2019, PRAAT DOING PHONETIC
[10]  
Campbell J. P. C. W. M., 2006, OD 2004 SPEAK LANG R