Automatic Pronunciation Generator for Indonesian Speech Recognition System Based on Sequence-to-Sequence Model

被引：0

作者：

Hoesen, Devin ^{[1
]}

Putri, Fanda Yuliana ^{[1
]}

Lestari, Dessi Puji ^{[2
]}

机构：

[1] Prosa ai, Bandung, Indonesia

[2] Inst Teknol Bandung, Bandung, Indonesia

来源：

2019 22ND CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA) | 2019年

关键词：

Indonesian; pronunciation dictionary; sequenceto-sequence; speech recognition;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pronunciation dictionary plays an important role in a speech recognition system. Expert knowledge is required to obtain an accurate dictionary by manually giving pronunciation for each word. On account of the continually increasing vocabulary size, especially for Indonesian language, it is impractical to manually give the pronunciation for each word. Indonesian spelling-to-pronunciation rules are relatively regular; thus, it is plausible to produce pronunciation for a word by using the predefined rules. Nevertheless, the rules still contain a few irregularities for some spellings and they still cannot handle the presence of code-mixed words and abbreviations. In this paper, we employ a sequence-to-sequence (seq2seq) approach to generate pronunciation for each word in an Indonesian dictionary. It is demonstrated that by using this approach, we can obtain a similar speech-recognition error-rate while requiring only a fractional amount of resource. Our crossvalidation experiment for validating the resulting phonetic sequences achieves 4.15-6.24% phone error rate (PER). When an automatically produced dictionary is applied in a speech recognition system, the word accuracy only degrades 2.22 percentage point compared to the one produced manually. Therefore, creating a new large pronunciation dictionary using the proposed model is more efficient without degrading the recognition accuracy significantly.

引用

页码：7 / 12

页数：6

共 50 条

[1] CORRECTION OF AUTOMATIC SPEECH RECOGNITION WITH TRANSFORMER SEQUENCE-TO-SEQUENCE MODEL
Hrinchuk, Oleksii
Popova, Mariya
Ginsburg, Boris
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7074 - 7078
[2] A Sequence-to-Sequence Pronunciation Model for Bangla Speech Synthesis
Ahmad, Arif
Hussain, Mohammed Raihan
Selim, Mohammad Reza
Iqbal, Muhammed Zafar
Rahman, Mohammad Shahidur
2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,
[3] Advancing sequence-to-sequence based speech recognition
Tuske, Zoltan
Audhkhasi, Kartik
Saon, George
INTERSPEECH 2019, 2019, : 3780 - 3784
[4] Dysarthric Speech Transformer: A Sequence-to-Sequence Dysarthric Speech Recognition System
Shahamiri, Seyed Reza
Lal, Vanshika
Shah, Dhvani
IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2023, 31 : 3407 - 3416
[5] High Performance Sequence-to-Sequence Model for Streaming Speech Recognition
Thai-Son Nguyen
Ngoc-Quan Pham
Stueker, Sebastian
Waibel, Alex
INTERSPEECH 2020, 2020, : 2147 - 2151
[6] SPEECH-TRANSFORMER: A NO-RECURRENCE SEQUENCE-TO-SEQUENCE MODEL FOR SPEECH RECOGNITION
Dong, Linhao
Xu, Shuang
Xu, Bo
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5884 - 5888
[7] MULTIMODAL GROUNDING FOR SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION
Caglayan, Ozan
Sanabria, Ramon
Palaskar, Shruti
Barrault, Loic
Metze, Florian
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 8648 - 8652
[8] SEQUENCE-TO-SEQUENCE AUTOMATIC SPEECH RECOGNITION WITH WORD EMBEDDING REGULARIZATION AND FUSED DECODING
Liu, Alexander H.
Sung, Tzu-Wei
Chuang, Shun-Po
Lee, Hung-yi
Lee, Lin-shah
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7879 - 7883
[9] Synthesizing waveform sequence-to-sequence to augment training data for sequence-to-sequence speech recognition
Ueno, Sei
Mimura, Masato
Sakai, Shinsuke
Kawahara, Tatsuya
ACOUSTICAL SCIENCE AND TECHNOLOGY, 2021, 42 (06) : 333 - 343
[10] A Comparison of Sequence-to-Sequence Models for Speech Recognition
Prabhavalkar, Rohit
Rao, Kanishka
Sainath, Tara N.
Li, Bo
Johnson, Leif
Jaitly, Navdeep
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 939 - 943

← 1 2 3 4 5 →