Automatic Pronunciation Generator for Indonesian Speech Recognition System Based on Sequence-to-Sequence Model

被引：0

作者：

Hoesen, Devin ^{[1
]}

Putri, Fanda Yuliana ^{[1
]}

Lestari, Dessi Puji ^{[2
]}

机构：

[1] Prosa ai, Bandung, Indonesia

[2] Inst Teknol Bandung, Bandung, Indonesia

来源：

2019 22ND CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA) | 2019年

关键词：

Indonesian; pronunciation dictionary; sequenceto-sequence; speech recognition;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pronunciation dictionary plays an important role in a speech recognition system. Expert knowledge is required to obtain an accurate dictionary by manually giving pronunciation for each word. On account of the continually increasing vocabulary size, especially for Indonesian language, it is impractical to manually give the pronunciation for each word. Indonesian spelling-to-pronunciation rules are relatively regular; thus, it is plausible to produce pronunciation for a word by using the predefined rules. Nevertheless, the rules still contain a few irregularities for some spellings and they still cannot handle the presence of code-mixed words and abbreviations. In this paper, we employ a sequence-to-sequence (seq2seq) approach to generate pronunciation for each word in an Indonesian dictionary. It is demonstrated that by using this approach, we can obtain a similar speech-recognition error-rate while requiring only a fractional amount of resource. Our crossvalidation experiment for validating the resulting phonetic sequences achieves 4.15-6.24% phone error rate (PER). When an automatically produced dictionary is applied in a speech recognition system, the word accuracy only degrades 2.22 percentage point compared to the one produced manually. Therefore, creating a new large pronunciation dictionary using the proposed model is more efficient without degrading the recognition accuracy significantly.

引用

页码：7 / 12

页数：6

共 50 条

[41] Evaluation of pronunciation by means of automatic speech recognition system for computer aided Indonesian language learning
Indrayanti, Linda
Usagawa, Tsuyoshi
Chisaki, Yoshifumi
Dutono, Titon
2006 7TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY BASED HIGHER EDUCATION AND TRAINING, VOLS 1 AND 2, 2006, : 571 - 574
[42] Semi-supervised Training for Sequence-to-Sequence Speech Recognition Using Reinforcement Learning
Chung, Hoon
Jeon, Hyeong-Bae
Park, Jeon Gue
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[43] RASR2: The RWTH ASR Toolkit for Generic Sequence-to-sequence Speech Recognition
Zhou, Wei
Beck, Eugen
Berger, Simon
Schlueter, Ralf
Ney, Hermann
INTERSPEECH 2023, 2023, : 4094 - 4098
[44] Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection
Liu, Danni
Spanakis, Gerasimos
Niehues, Jan
INTERSPEECH 2020, 2020, : 3620 - 3624
[45] Sequence-to-sequence Modelling for Categorical Speech Emotion Recognition Using Recurrent Neural Network
Chen, Xiaomin
Han, Wenjing
Ruan, Huabin
Liu, Jiamu
Li, Haifeng
Jiang, Dongmei
2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
[46] Sequence-to-Sequence Contrastive Learning for Text Recognition
Aberdam, Aviad
Litman, Ron
Tsiper, Shahar
Anschel, Oron
Slossberg, Ron
Mazor, Shai
Manmatha, R.
Perona, Pietro
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15297 - 15307
[47] Intrusion Prediction With System-Call Sequence-to-Sequence Model
Lv, Shaohua
Wang, Jian
Yang, Yinqi
Liu, Jiqiang
IEEE ACCESS, 2018, 6 : 71413 - 71421
[48] Controlling Sequence-to-Sequence Models - A Demonstration on Neural-based Acrostic Generator
Shen, Liang-Hsin
Tai, Pei-Lun
Wu, Chao-Chung
Lin, Shou-De
2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, 2019, : 43 - 48
[49] Enhancing Sequence-to-Sequence Text-to-Speech with Morphology
Taylor, Jason
Richmond, Korin
INTERSPEECH 2020, 2020, : 1738 - 1742
[50] Using Character-Level Sequence-to-Sequence Model for Word Level Text Generation to Enhance Arabic Speech Recognition
Azim, Mona A.
Hussein, Wedad
Badr, Nagwa L.
IEEE ACCESS, 2023, 11 : 91173 - 91183

← 1 2 3 4 5 →