Analysis and recognition of spontaneous speech using Corpus of Spontaneous Japanese

被引:17
作者
Furui, S [1 ]
Nakamura, M [1 ]
Ichiba, T [1 ]
Iwano, K [1 ]
机构
[1] Tokyo Inst Technol, Dept Comp Sci, Meguro Ku, Tokyo 1528552, Japan
基金
日本科学技术振兴机构;
关键词
spontaneous speech; Corpus of Spontaneous Japanese; automatic speech recognition; cepstrum; speaking rate;
D O I
10.1016/j.specom.2005.02.010
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Although speech is in almost any situation spontaneous, recognition of spontaneous speech is an area which has only recently emerged in the field of automatic speech recognition. Broadening the application of speech recognition depends crucially on raising recognition performance for spontaneous speech. For this purpose, it is necessary to analyze and model spontaneous speech using spontaneous speech databases, since spontaneous speech and read speech are significantly different. This paper reports analysis and recognition of spontaneous speech using a large-scale spontaneous speech database "Corpus of Spontaneous Japanese (CSJ)". Recognition results in this experiment show that recognition accuracy significantly increases as a function of the size of acoustic as well as language model training data and the improvement levels off at approximately 7M words of training data. This means that acoustic and linguistic variation of spontaneous speech is so large that we need a very large corpus in order to encompass the variations. Spectral analysis using various styles of utterances in the CSJ shows that the spectral distribution/difference of phonemes is significantly reduced in spontaneous speech compared to read speech. It has also been observed that speaking rates of both vowels and consonants in spontaneous speech are significantly faster than those in read speech. (C) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:208 / 219
页数:12
相关论文
共 26 条
[1]  
[Anonymous], 2002, Proceedings of the 7th International Conference on Spoken Language Processing, DOI [DOI 10.21437/ICSLP.2002-468, 10.21437/ICSLP.2002-468]
[2]  
[Anonymous], P SSPR 2003
[4]  
Evermann G, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P249
[5]  
Furui S, 2003, PATTERN RECOGNITION IN SPEECH AND LANGUAGE PROCESSING, P191
[6]  
Furui S., 2003, P ISCA IEEE WORKSH S, P1
[7]  
FURUI S, 2004, P INT S LARG SCAL KN, P1
[8]  
Gauvain JL, 2003, PATTERN RECOGNITION IN SPEECH AND LANGUAGE PROCESSING, P149
[9]  
ICHIBA T, 2004, P AC SOC JAP FALL M
[10]  
KAWAHARA T, 2004, P SPEC WORKSH MAUI S