Standardization of speech corpus

被引:0
作者
Li, Ai-Jun [1 ]
Yin, Zhi-Gang [1 ]
机构
[1] Institute of Linguistics, Chinese Academy of Social Sciences, Beijing
关键词
Data standardization; Interoperability; Phonetics; Speech corpus;
D O I
10.2481/dsj.6.S806
中图分类号
学科分类号
摘要
Speech corpus is the basis for analyzing the characteristics of speech signals and developing speech synthesis and recognition systems. In China, almost all speech research and development affiliations are developing their own speech corpora. We have so many different kinds numbers of Chinese speech corpora that it is important to be able to conveniently share these speech corpora to avoid wasting time and money and to make research work more efficient. The primary goal of this research is to find a standard scheme which can make the corpus be established more efficiently and be used or shared more easily. A huge speech corpus on 10 regional accented Chinese, RASC863 (a Regional Accent Speech Corpus funded by National 863 Project) will be exemplified to illuminate the standardization of speech corpus production.
引用
收藏
页码:S806 / S812
页数:6
相关论文
共 4 条
[1]  
Li A., Yin Z., Wang T., Fang Q., Hu F., RASC863 - A Chinese Speech Corpus with Four Regional Accents, (2004)
[2]  
Li A., Zu Y., Corpus Design and Annotation for Speech Synthesis and Recognition, Advances in Chinese Spoken Language Processing, (2006)
[3]  
Schiel F., Draxler C., Production and validation of speech corpora, (2003)
[4]  
Yin Z., The introduction of speech corpus research and establishment, The newspaper of CASS, (2006)