Context-dependent acoustic modeling using graphemes for large vocabulary speech recognition

被引:0
|
作者
Kanthak, S [1 ]
Ney, H [1 ]
机构
[1] Rhein Westfal TH Aachen, Dept Comp Sci, Lehrstuhl Informat 6, D-52056 Aachen, Germany
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper we propose to use a decision tree based on graphemic acoustic sub-word units together with phonetic questions. We also show that automatic question generation can be used to completely eliminate any manual effort. We present experimental results on four corpora with different languages, namely the Dutch ARISE corpus, the Italian EUTRANS EVAL00 evaluation corpus, the German VERBMOBIL '00 development corpus and the English North American Business '94 20k and 64k development corpora. For all experiments, the acoustic models are trained from scratch in order not to use any prior phonetic knowledge. Complete training procedures have been iterated to simulate the long optimization history used for the phonemic acoustic models. With minimal manual effort we show that for the Dutch, German and Italian corpora, the presented approach works surprisingly well and increases the word error rate by not more than 2% relative. On the English NAB task the error rate is about 20% higher compared to experiments using a pronunciation lexicon.
引用
收藏
页码:845 / 848
页数:4
相关论文
共 50 条
  • [1] Full expansion of context-dependent networks in large vocabulary speech recognition
    Mohri, M
    Riley, M
    Hindle, D
    Ljolje, A
    Pereira, F
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 665 - 668
  • [2] LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION WITH CONTEXT-DEPENDENT DBN-HMMS
    Dahl, George E.
    Yu, Dong
    Deng, Li
    Acero, Alex
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4688 - 4691
  • [3] A frame-based context-dependent acoustic modeling for speech recognition
    Terashima R.
    Zen H.
    Nankaku Y.
    Tokuda K.
    IEEJ Transactions on Electronics, Information and Systems, 2010, 130 (10) : 1856 - 1864+24
  • [4] FROM SENONES TO CHENONES: TIED CONTEXT-DEPENDENT GRAPHEMES FOR HYBRID SPEECH RECOGNITION
    Le, Duc
    Zhang, Xiaohui
    Zheng, Weiyi
    Fugen, Christian
    Zweig, Geoffrey
    Seltzer, Michael L.
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 457 - 464
  • [5] Context-dependent acoustic models for Chinese speech recognition
    Ma, B
    Huang, TY
    Xu, B
    Zhang, XJ
    Qu, F
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 455 - 458
  • [6] Context-dependent units for vocabulary-independent Spanish speech recognition
    Villarrubia, L
    Gomez, LH
    Elvira, JM
    Torrecilla, JC
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 451 - 454
  • [7] Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
    Dahl, George E.
    Yu, Dong
    Deng, Li
    Acero, Alex
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01): : 30 - 42
  • [8] MDL-based context-dependent subword modeling for speech recognition
    Shinoda, Koichi
    Watanabe, Takao
    Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi), 2000, 21 (02): : 79 - 86
  • [9] Eigentriphones for Context-Dependent Acoustic Modeling
    Ko, Tom
    Mak, Brian
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (06): : 1285 - 1294
  • [10] ACOUSTIC MODELING OF SUBWORD UNITS FOR LARGE VOCABULARY SPEAKER INDEPENDENT SPEECH RECOGNITION
    LEE, CH
    RABINER, LR
    PIERACCINI, R
    WILPON, JG
    SPEECH AND NATURAL LANGUAGE, 1989, : 280 - 291