Prosodic Modeling in Large Vocabulary Mandarin Speech Recognition

被引:0
作者
Huang, Jui-Ting [1 ]
Lee, Lin-shan [1 ]
机构
[1] Natl Taiwan Univ, Grad Inst Commun Engn, Taipei, Taiwan
来源
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 | 2006年
关键词
prosody; speech recognition; Mandarin tone; lexical word boundaries;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The issue of incorporating prosodic information into speech recognition processes has emerged in recent years. In this work we present a complete framework for Mandarin speech recognition with prosodic modeling considering two-level hierarchical prosodic information for Mandarin Chinese. We developed a GMM-based, a decision-tree-based, and a hybrid approach. The best improvements in character recognition accuracy were obtained by the decision-tree-based prosodic models. This approach does NOT require a training corpus labeled with prosodic features, and works reasonably for a large-scale multi-speaker task.
引用
收藏
页码:1241 / 1244
页数:4
相关论文
共 9 条
  • [1] BOURLARD H, IEEE T NEURAL NETWOR, V4, P893
  • [2] CHEN K, 2003, P EURO SPEECH GEN, P393
  • [3] CHEN SH, IEEE T COMMUNICATION, V38, P1317
  • [4] HIROSE K, 2004, P ICSLP
  • [5] HUANG JT, 2006, SPEECH PROSODY
  • [6] Prosody-based automatic segmentation of speech into sentences and topics
    Shriberg, E
    Stolcke, A
    Hakkani-Tür, D
    Tür, G
    [J]. SPEECH COMMUNICATION, 2000, 32 (1-2) : 127 - 154
  • [7] Stolcke A, 1999, P 6 EUR C SPEECH COM, V1, P307
  • [8] TSENG CY, SPEECH COMMUNICATION, V46, P284
  • [9] VERGYRI D, 2003, P ICASSP HONG KONG A, V1, P208