Syllable Based Language Model for Large Vocabulary Continuous Speech Recognition of Polish

被引：0

作者：

Majewski, Piotr ^{[1
]}

机构：

[1] Univ Lodz, Fac Math & Comp Sci, PL-90238 Lodz, Poland

来源：

TEXT, SPEECH AND DIALOGUE, PROCEEDINGS | 2008年 / 5246卷

关键词：

Polish; large vocabulary continuous speech recognition; language modeling; sub-word units; syllable-based units;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Most of state-of-the-art large vocabulary continuous speech recognition systems use word-based n-gram language models. Such models are not optimal solution for inflectional or agglutinative languages. The Polish language is highly inflectional one and requires a very large corpora to create a sufficient language model with the small out-of-vocabulary ratio. We propose a syllable-based language model. which is better suited to highly inflectional language like Polish. In case of lack of resources (i.e. small corpora) syllable-based model outperforms word-based models in terms of number of out-of-vocabulary units (syllables in our model). Such model is an approximation of the morphene-based model for Polish. In our paper, we show results of evaluation of syllable based model and its usefulness in speech recognition tasks.

引用

页码：397 / 401

页数：5

共 49 条

[1] A unified language model for large vocabulary continuous speech recognition of Turkish
Arisoy, Ebru
Dutagaci, Helin
Arslan, Levent M.
SIGNAL PROCESSING, 2006, 86 (10) : 2844 - 2862
[2] Subspace Gaussian mixture based language modeling for large vocabulary continuous speech recognition
Sun, Ri Hyon
Chol, Ri Jong
SPEECH COMMUNICATION, 2020, 117 : 21 - 27
[3] Specifics of hidden Markov model modifications for large vocabulary continuous speech recognition
Silingas, D
Telksnys, L
INFORMATICA, 2004, 15 (01) : 93 - 110
[4] Deep learning based large vocabulary continuous speech recognition of an under-resourced language Bangladeshi Bangla
Samin, Ahnaf Mozib
Kobir, M. Humayon
Kibria, Shafkat
Rahman, M. Shahidur
ACOUSTICAL SCIENCE AND TECHNOLOGY, 2021, 42 (05) : 252 - 260
[5] Large vocabulary continuous speech recognition of an inflected language using stems and endings
Rotovnik, Tomaz
Maucec, Mirjam Sepesy
Kacic, Zdravko
SPEECH COMMUNICATION, 2007, 49 (06) : 437 - 452
[6] Large Vocabulary Continuous Speech Recognition With Reservoir-Based Acoustic Models
Triefenbach, Fabian
Demuynck, Kris
Martens, Jean-Pierre
IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (03) : 311 - 315
[7] Extra Large Vocabulary Continuous Speech Recognition Algorithm based on Information Retrieval
Pylypenko, Valeriy
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1809 - 1812
[8] A Detailed Survey on Large Vocabulary Continuous Speech Recognition Techniques
Vanajakshi, P.
Mathivanan, M.
2017 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2017,
[9] Training a language model using webdata for large vocabulary Japanese spontaneous speech recognition
Masumura, Ryo
Hahm, Seongjun
Ito, Akinori
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1476 - 1479
[10] ADVANCES IN LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION IN GREEK: MODELING AND NONLINEAR FEATURES
Rodomagoulakis, Isidoros
Potamianos, Gerasimos
Maragos, Petros
2013 PROCEEDINGS OF THE 21ST EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2013,

← 1 2 3 4 5 →