Automatic Pronunciation Generator for Indonesian Speech Recognition System Based on Sequence-to-Sequence Model

被引：0

作者：

Hoesen, Devin ^{[1
]}

Putri, Fanda Yuliana ^{[1
]}

Lestari, Dessi Puji ^{[2
]}

机构：

[1] Prosa ai, Bandung, Indonesia

[2] Inst Teknol Bandung, Bandung, Indonesia

来源：

2019 22ND CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA) | 2019年

关键词：

Indonesian; pronunciation dictionary; sequenceto-sequence; speech recognition;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pronunciation dictionary plays an important role in a speech recognition system. Expert knowledge is required to obtain an accurate dictionary by manually giving pronunciation for each word. On account of the continually increasing vocabulary size, especially for Indonesian language, it is impractical to manually give the pronunciation for each word. Indonesian spelling-to-pronunciation rules are relatively regular; thus, it is plausible to produce pronunciation for a word by using the predefined rules. Nevertheless, the rules still contain a few irregularities for some spellings and they still cannot handle the presence of code-mixed words and abbreviations. In this paper, we employ a sequence-to-sequence (seq2seq) approach to generate pronunciation for each word in an Indonesian dictionary. It is demonstrated that by using this approach, we can obtain a similar speech-recognition error-rate while requiring only a fractional amount of resource. Our crossvalidation experiment for validating the resulting phonetic sequences achieves 4.15-6.24% phone error rate (PER). When an automatically produced dictionary is applied in a speech recognition system, the word accuracy only degrades 2.22 percentage point compared to the one produced manually. Therefore, creating a new large pronunciation dictionary using the proposed model is more efficient without degrading the recognition accuracy significantly.

引用

页码：7 / 12

页数：6

共 50 条

[31] Rule-Based Pronunciation Models to Handle OOV Words for Indonesian Automatic Speech Recognition System
Putri, Fanda Yuliana
Hoesen, Devin
Lestari, Dessi Puji
2019 5TH INTERNATIONAL CONFERENCE ON SCIENCE ININFORMATION TECHNOLOGY (ICSITECH): EMBRACING INDUSTRY 4.0 - TOWARDS INNOVATION IN CYBER PHYSICAL SYSTEM, 2019, : 246 - 251
[32] INTEGRATING SOURCE-CHANNEL AND ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION
Li, Qiujia
Zhang, Chao
Woodland, Philip C.
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 39 - 46
[33] MULTILINGUAL SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION: ARCHITECTURE, TRANSFER LEARNING, AND LANGUAGE MODELING
Cho, Jaejin
Baskar, Murali Karthick
Li, Ruizhi
Wiesner, Matthew
Mallidi, Sri Harish
Yalta, Nelson
Karafiat, Martin
Watanabe, Shinji
Hori, Takaaki
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 521 - 527
[34] IMPROVING SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION TRAINING WITH ON-THE-FLY DATA AUGMENTATION
Nguyen, Thai-Son
Stuker, Sebastian
Niehues, Jan
Waibel, Alex
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7689 - 7693
[35] A Sequence-to-Sequence Model for Online Signal Detection and Format Recognition
Cheng, Le
Zhu, Hongna
Hu, Zhengliang
Luo, Bin
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 994 - 998
[36] Automatic Generation of Artificial Space Weather Forecast Product Based on Sequence-to-sequence Model
罗冠霆
ZOU Yenan
CAI Yanxia
空间科学学报, 2024, 44 (01) : 80 - 94
[37] A Comparison of Modeling Units in Sequence-to-Sequence Speech Recognition with the Transformer on Mandarin Chinese
Zhou, Shiyu
Dong, Linhao
Xu, Shuang
Xu, Bo
NEURAL INFORMATION PROCESSING (ICONIP 2018), PT V, 2018, 11305 : 210 - 220
[38] A Sequence-to-Sequence Framework Based on Transformer With Masked Language Model for Optical Music Recognition
Wen, Cuihong
Zhu, Longjiao
IEEE ACCESS, 2022, 10 : 118243 - 118252
[39] Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition
Weng, Chao
Cui, Jia
Wang, Guangsen
Wang, Jun
Yu, Changzhu
Su, Dan
Yu, Dong
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 761 - 765
[40] Sequence-to-Sequence Models for Emphasis Speech Translation
Quoc Truong Do
Sakti, Sakriani
Nakamura, Satoshi
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (10) : 1873 - 1883

← 1 2 3 4 5 →