Syllable Based Language Model for Large Vocabulary Continuous Speech Recognition of Polish

被引:0
|
作者
Majewski, Piotr [1 ]
机构
[1] Univ Lodz, Fac Math & Comp Sci, PL-90238 Lodz, Poland
来源
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS | 2008年 / 5246卷
关键词
Polish; large vocabulary continuous speech recognition; language modeling; sub-word units; syllable-based units;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most of state-of-the-art large vocabulary continuous speech recognition systems use word-based n-gram language models. Such models are not optimal solution for inflectional or agglutinative languages. The Polish language is highly inflectional one and requires a very large corpora to create a sufficient language model with the small out-of-vocabulary ratio. We propose a syllable-based language model. which is better suited to highly inflectional language like Polish. In case of lack of resources (i.e. small corpora) syllable-based model outperforms word-based models in terms of number of out-of-vocabulary units (syllables in our model). Such model is an approximation of the morphene-based model for Polish. In our paper, we show results of evaluation of syllable based model and its usefulness in speech recognition tasks.
引用
收藏
页码:397 / 401
页数:5
相关论文
共 49 条
  • [1] A unified language model for large vocabulary continuous speech recognition of Turkish
    Arisoy, Ebru
    Dutagaci, Helin
    Arslan, Levent M.
    SIGNAL PROCESSING, 2006, 86 (10) : 2844 - 2862
  • [2] Subspace Gaussian mixture based language modeling for large vocabulary continuous speech recognition
    Sun, Ri Hyon
    Chol, Ri Jong
    SPEECH COMMUNICATION, 2020, 117 : 21 - 27
  • [3] Specifics of hidden Markov model modifications for large vocabulary continuous speech recognition
    Silingas, D
    Telksnys, L
    INFORMATICA, 2004, 15 (01) : 93 - 110
  • [4] Deep learning based large vocabulary continuous speech recognition of an under-resourced language Bangladeshi Bangla
    Samin, Ahnaf Mozib
    Kobir, M. Humayon
    Kibria, Shafkat
    Rahman, M. Shahidur
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2021, 42 (05) : 252 - 260
  • [5] Large vocabulary continuous speech recognition of an inflected language using stems and endings
    Rotovnik, Tomaz
    Maucec, Mirjam Sepesy
    Kacic, Zdravko
    SPEECH COMMUNICATION, 2007, 49 (06) : 437 - 452
  • [6] Large Vocabulary Continuous Speech Recognition With Reservoir-Based Acoustic Models
    Triefenbach, Fabian
    Demuynck, Kris
    Martens, Jean-Pierre
    IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (03) : 311 - 315
  • [7] Extra Large Vocabulary Continuous Speech Recognition Algorithm based on Information Retrieval
    Pylypenko, Valeriy
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1809 - 1812
  • [8] A Detailed Survey on Large Vocabulary Continuous Speech Recognition Techniques
    Vanajakshi, P.
    Mathivanan, M.
    2017 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2017,
  • [9] Training a language model using webdata for large vocabulary Japanese spontaneous speech recognition
    Masumura, Ryo
    Hahm, Seongjun
    Ito, Akinori
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1476 - 1479
  • [10] ADVANCES IN LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION IN GREEK: MODELING AND NONLINEAR FEATURES
    Rodomagoulakis, Isidoros
    Potamianos, Gerasimos
    Maragos, Petros
    2013 PROCEEDINGS OF THE 21ST EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2013,