Automatic Pronunciation Generator for Indonesian Speech Recognition System Based on Sequence-to-Sequence Model

被引:0
|
作者
Hoesen, Devin [1 ]
Putri, Fanda Yuliana [1 ]
Lestari, Dessi Puji [2 ]
机构
[1] Prosa ai, Bandung, Indonesia
[2] Inst Teknol Bandung, Bandung, Indonesia
来源
2019 22ND CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA) | 2019年
关键词
Indonesian; pronunciation dictionary; sequenceto-sequence; speech recognition;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pronunciation dictionary plays an important role in a speech recognition system. Expert knowledge is required to obtain an accurate dictionary by manually giving pronunciation for each word. On account of the continually increasing vocabulary size, especially for Indonesian language, it is impractical to manually give the pronunciation for each word. Indonesian spelling-to-pronunciation rules are relatively regular; thus, it is plausible to produce pronunciation for a word by using the predefined rules. Nevertheless, the rules still contain a few irregularities for some spellings and they still cannot handle the presence of code-mixed words and abbreviations. In this paper, we employ a sequence-to-sequence (seq2seq) approach to generate pronunciation for each word in an Indonesian dictionary. It is demonstrated that by using this approach, we can obtain a similar speech-recognition error-rate while requiring only a fractional amount of resource. Our crossvalidation experiment for validating the resulting phonetic sequences achieves 4.15-6.24% phone error rate (PER). When an automatically produced dictionary is applied in a speech recognition system, the word accuracy only degrades 2.22 percentage point compared to the one produced manually. Therefore, creating a new large pronunciation dictionary using the proposed model is more efficient without degrading the recognition accuracy significantly.
引用
收藏
页码:7 / 12
页数:6
相关论文
共 50 条
  • [31] Rule-Based Pronunciation Models to Handle OOV Words for Indonesian Automatic Speech Recognition System
    Putri, Fanda Yuliana
    Hoesen, Devin
    Lestari, Dessi Puji
    2019 5TH INTERNATIONAL CONFERENCE ON SCIENCE ININFORMATION TECHNOLOGY (ICSITECH): EMBRACING INDUSTRY 4.0 - TOWARDS INNOVATION IN CYBER PHYSICAL SYSTEM, 2019, : 246 - 251
  • [32] INTEGRATING SOURCE-CHANNEL AND ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION
    Li, Qiujia
    Zhang, Chao
    Woodland, Philip C.
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 39 - 46
  • [33] MULTILINGUAL SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION: ARCHITECTURE, TRANSFER LEARNING, AND LANGUAGE MODELING
    Cho, Jaejin
    Baskar, Murali Karthick
    Li, Ruizhi
    Wiesner, Matthew
    Mallidi, Sri Harish
    Yalta, Nelson
    Karafiat, Martin
    Watanabe, Shinji
    Hori, Takaaki
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 521 - 527
  • [34] IMPROVING SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION TRAINING WITH ON-THE-FLY DATA AUGMENTATION
    Nguyen, Thai-Son
    Stuker, Sebastian
    Niehues, Jan
    Waibel, Alex
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7689 - 7693
  • [35] A Sequence-to-Sequence Model for Online Signal Detection and Format Recognition
    Cheng, Le
    Zhu, Hongna
    Hu, Zhengliang
    Luo, Bin
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 994 - 998
  • [36] Automatic Generation of Artificial Space Weather Forecast Product Based on Sequence-to-sequence Model
    罗冠霆
    ZOU Yenan
    CAI Yanxia
    空间科学学报, 2024, 44 (01) : 80 - 94
  • [37] A Comparison of Modeling Units in Sequence-to-Sequence Speech Recognition with the Transformer on Mandarin Chinese
    Zhou, Shiyu
    Dong, Linhao
    Xu, Shuang
    Xu, Bo
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT V, 2018, 11305 : 210 - 220
  • [38] A Sequence-to-Sequence Framework Based on Transformer With Masked Language Model for Optical Music Recognition
    Wen, Cuihong
    Zhu, Longjiao
    IEEE ACCESS, 2022, 10 : 118243 - 118252
  • [39] Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition
    Weng, Chao
    Cui, Jia
    Wang, Guangsen
    Wang, Jun
    Yu, Changzhu
    Su, Dan
    Yu, Dong
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 761 - 765
  • [40] Sequence-to-Sequence Models for Emphasis Speech Translation
    Quoc Truong Do
    Sakti, Sakriani
    Nakamura, Satoshi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (10) : 1873 - 1883