Improved Recognition of Spontaneous Hungarian Speech-Morphological and Acoustic Modeling Techniques for a Less Resourced Task

被引:22
作者
Peter Mihajlik [1 ,2 ]
Zoltan Tueske [1 ]
Balazs Tarjan [1 ]
Bottyan Nemeth [1 ]
Tibor Fegyo [1 ,3 ]
机构
[1] Budapest Univ Technol & Econ, Dept Telecommun & Media Informat, H-1117 Budapest, Hungary
[2] THINKTech Res Ctr Nonprofit LLC, H-2600 Vac, Hungary
[3] AITIA Int Inc, H-1039 Budapest, Hungary
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 06期
关键词
Acoustic modeling; morphologically rich languages; speech recognition; spontaneous large-vocabulary continuous speech recognition (LVCSR); subword-based language modeling;
D O I
10.1109/TASL.2009.2038807
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Various morphological and acoustic modeling techniques are evaluated on a less resourced, spontaneous Hungarian large-vocabulary continuous speech recognition (LVCSR) task. Among morphologically rich languages, Hungarian is known for its agglutinative, inflective nature that increases the data sparseness caused by a relatively small training database. Although Hungarian spelling is considered as simple phonological, a large part of the corpus is covered by words pronounced in multiple, phonemically different ways. Data-driven and language specific knowledge supported vocabulary decomposition methods are investigated in combination with phoneme-and grapheme-based acoustic modeling techniques on the given task. Word baseline and morph-based advanced baseline results are significantly outperformed by using both statistical and grammatical vocabulary decomposition methods. Although the discussed morph-based techniques recognize a significant amount of out of vocabulary words, the improvements are due not to this fact but to the reduction of insertion errors. Applying grapheme-based acoustic models instead of phoneme-based models causes no severe recognition performance deteriorations. Moreover, a fully data-driven acoustic modeling technique along with a statistical morphological modeling approach provides the best performance on the most difficult test set. The overall best speech recognition performance is obtained by using a novel word to morph decomposition technique that combines grammatical and unsupervised statistical segmentation algorithms. The improvement achieved by the proposed technique is stable across acoustic modeling approaches and larger with speaker adaptation.
引用
收藏
页码:1588 / 1600
页数:13
相关论文
共 41 条
  • [1] Afify Mohamed., 2006, INTERSPEECH-2006, P1444
  • [2] Arisoy E., 2007, P INT EUR ANTW BELG, P2381
  • [3] Turkish Broadcast News Transcription and Retrieval
    Arisoy, Ebru
    Can, Dogan
    Parlak, Siddika
    Sak, Hasim
    Saraclar, Murat
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (05): : 874 - 883
  • [4] Berton A, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P1165, DOI 10.1109/ICSLP.1996.607814
  • [5] Automatic recognition of spontaneous speech for access to multilingual oral history archives
    Byrne, W
    Doermann, D
    Franz, MT
    Gustman, S
    Hajic, J
    Oard, D
    Picheny, M
    Psutka, J
    Ramabhadran, B
    Soergel, D
    Ward, T
    Zhu, WJ
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (04): : 420 - 435
  • [6] CREUTZ M, 2005, A81 HELS U TECHN PUB
  • [7] CREUTZ M, 2005, P AKRR 05 ESP FINL J
  • [8] CREUTZ M, 2007, ACM T SPEECH LANG PR, V5
  • [9] Geutner P, 1998, INT CONF ACOUST SPEE, P925, DOI 10.1109/ICASSP.1998.675417
  • [10] HALACSY P, 2006, P EV MULT MULT INF R, V4730, P101