Automatic Pronunciation Generator for Indonesian Speech Recognition System Based on Sequence-to-Sequence Model

被引:0
|
作者
Hoesen, Devin [1 ]
Putri, Fanda Yuliana [1 ]
Lestari, Dessi Puji [2 ]
机构
[1] Prosa ai, Bandung, Indonesia
[2] Inst Teknol Bandung, Bandung, Indonesia
来源
2019 22ND CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA) | 2019年
关键词
Indonesian; pronunciation dictionary; sequenceto-sequence; speech recognition;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pronunciation dictionary plays an important role in a speech recognition system. Expert knowledge is required to obtain an accurate dictionary by manually giving pronunciation for each word. On account of the continually increasing vocabulary size, especially for Indonesian language, it is impractical to manually give the pronunciation for each word. Indonesian spelling-to-pronunciation rules are relatively regular; thus, it is plausible to produce pronunciation for a word by using the predefined rules. Nevertheless, the rules still contain a few irregularities for some spellings and they still cannot handle the presence of code-mixed words and abbreviations. In this paper, we employ a sequence-to-sequence (seq2seq) approach to generate pronunciation for each word in an Indonesian dictionary. It is demonstrated that by using this approach, we can obtain a similar speech-recognition error-rate while requiring only a fractional amount of resource. Our crossvalidation experiment for validating the resulting phonetic sequences achieves 4.15-6.24% phone error rate (PER). When an automatically produced dictionary is applied in a speech recognition system, the word accuracy only degrades 2.22 percentage point compared to the one produced manually. Therefore, creating a new large pronunciation dictionary using the proposed model is more efficient without degrading the recognition accuracy significantly.
引用
收藏
页码:7 / 12
页数:6
相关论文
共 50 条
  • [41] Evaluation of pronunciation by means of automatic speech recognition system for computer aided Indonesian language learning
    Indrayanti, Linda
    Usagawa, Tsuyoshi
    Chisaki, Yoshifumi
    Dutono, Titon
    2006 7TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY BASED HIGHER EDUCATION AND TRAINING, VOLS 1 AND 2, 2006, : 571 - 574
  • [42] Semi-supervised Training for Sequence-to-Sequence Speech Recognition Using Reinforcement Learning
    Chung, Hoon
    Jeon, Hyeong-Bae
    Park, Jeon Gue
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [43] RASR2: The RWTH ASR Toolkit for Generic Sequence-to-sequence Speech Recognition
    Zhou, Wei
    Beck, Eugen
    Berger, Simon
    Schlueter, Ralf
    Ney, Hermann
    INTERSPEECH 2023, 2023, : 4094 - 4098
  • [44] Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection
    Liu, Danni
    Spanakis, Gerasimos
    Niehues, Jan
    INTERSPEECH 2020, 2020, : 3620 - 3624
  • [45] Sequence-to-sequence Modelling for Categorical Speech Emotion Recognition Using Recurrent Neural Network
    Chen, Xiaomin
    Han, Wenjing
    Ruan, Huabin
    Liu, Jiamu
    Li, Haifeng
    Jiang, Dongmei
    2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
  • [46] Sequence-to-Sequence Contrastive Learning for Text Recognition
    Aberdam, Aviad
    Litman, Ron
    Tsiper, Shahar
    Anschel, Oron
    Slossberg, Ron
    Mazor, Shai
    Manmatha, R.
    Perona, Pietro
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15297 - 15307
  • [47] Intrusion Prediction With System-Call Sequence-to-Sequence Model
    Lv, Shaohua
    Wang, Jian
    Yang, Yinqi
    Liu, Jiqiang
    IEEE ACCESS, 2018, 6 : 71413 - 71421
  • [48] Controlling Sequence-to-Sequence Models - A Demonstration on Neural-based Acrostic Generator
    Shen, Liang-Hsin
    Tai, Pei-Lun
    Wu, Chao-Chung
    Lin, Shou-De
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, 2019, : 43 - 48
  • [49] Enhancing Sequence-to-Sequence Text-to-Speech with Morphology
    Taylor, Jason
    Richmond, Korin
    INTERSPEECH 2020, 2020, : 1738 - 1742
  • [50] Using Character-Level Sequence-to-Sequence Model for Word Level Text Generation to Enhance Arabic Speech Recognition
    Azim, Mona A.
    Hussein, Wedad
    Badr, Nagwa L.
    IEEE ACCESS, 2023, 11 : 91173 - 91183