Chhattisgarhi speech corpus for research and development in automatic speech recognition

被引:9
作者
Londhe, Narendra D. [1 ]
Kshirsagar, Ghanahshyam B. [1 ]
机构
[1] Natl Inst Technol, Dept Elect Engn, Raipur 492010, Chhattisgarh, India
关键词
Chhattisgarhi; Speech corpus; Automatic speech recognition; Mel-frequency cepstral coefficients;
D O I
10.1007/s10772-018-9496-7
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Automatic speech recognition (ASR) is a computerized interface which allows humans to communicate with machine in a way of its natural conversation. ASR has wide range of applications in various fields such as language development in young children, telecommunications, as an assistive device for hearing impaired etc. Performance of ASR system is greatly influenced by the database used for its implementation. In this paper, we are discussing about building a speech corpus for a rare but important Indian dialect Chhattisgarhi. This speech corpus consists of 100 unique isolated words and four speech scripts aggregating 67 sentences, recorded from total 478 native speakers. These words were selected from English to Chhattisgarhi dictionary published by Chhattisgarh Rajbhasha Aayog and scripts from Chhattisgarhi literature and newspaper articles. This dataset has been collected travelling over 60% geographical area of the Chhattisgarh state. Finally, a valuable speech corpus for the first time have been prepared for Chhattisgarhi with an aim to enhance the speech research. The successful extermination of speech recognition for both isolated and continuous speech samples have been demonstrated on the prepared database.
引用
收藏
页码:193 / 210
页数:18
相关论文
共 40 条
[1]  
Anumanchipalli G., 2008, P INT C SPEECH COMP
[2]  
Colin P. M., 1991, INDOARYAN LANGUAGES
[3]  
Danubianu M., 2012, P 7 INT MULT COMP GL, P29
[4]   The NIST speaker recognition evaluation - Overview, methodology, systems, results, perspective [J].
Doddington, GR ;
Przybocki, MA ;
Martin, AF ;
Reynolds, DA .
SPEECH COMMUNICATION, 2000, 31 (2-3) :225-254
[5]  
Gaurav G., 2012, J SIG INF PROCESS, V3, P394, DOI DOI 10.4236/JSIP.2012.33052
[6]   Text-independent speaker identification [J].
Gish, Herbert ;
Schmidt, Michael .
IEEE SIGNAL PROCESSING MAGAZINE, 1994, 11 (04) :18-32
[7]  
Hasnat M., 2007, ISOLATED CONTINUOUS
[8]  
Hegde S, 2012, COMM COM INF SC, V292, P262
[9]  
Herms R., 2016, LREC
[10]  
Hsu Chih-Wei, 2003, PRACTICAL GUIDE SUPP