Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for Ainu Language

被引:0
|
作者
Matsuura, Kohei [1 ]
Ueno, Sei [1 ]
Mimura, Masato [1 ]
Sakai, Shinsuke [1 ]
Kawahara, Tatsuya [1 ]
机构
[1] Kyoto Univ, Grad Sch Informat, Sakyo Ku, Kyoto 6068501, Japan
关键词
Ainu speech corpus; low-resource language; end-to-end speech recognition; JAPANESE;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Ainu is an unwritten language that has been spoken by Ainu people who are one of the ethnic groups in Japan. It is recognized as critically endangered by UNESCO and archiving and documentation of its language heritage is of paramount importance. Although a considerable amount of voice recordings of Ainu folklore has been produced and accumulated to save their culture, only a quite limited parts of them are transcribed so far. Thus, we started a project of automatic speech recognition (ASR) for the Ainu language in order to contribute to the development of annotated language archives. In this paper, we report speech corpus development and the structure and performance of end-to-end ASR for Ainu. We investigated four modeling units (phone, syllable, word piece, and word) and found that the syllable-based model performed best in terms of both word and phone recognition accuracy, which were about 60% and over 85% respectively in speaker-open condition. Furthermore, word and phone accuracy of 80% and 90% has been achieved in a speaker-closed setting. We also found out that a multilingual ASR training with additional speech corpora of English and Japanese further improves the speaker-open test accuracy.
引用
收藏
页码:2622 / 2628
页数:7
相关论文
共 50 条
  • [21] End-to-End Multilingual Speech Recognition System with Language Supervision Training
    Liu, Danyang
    Xu, Ji
    Zhang, Pengyuan
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (06): : 1427 - 1430
  • [22] An End-to-End Chinese Speech Recognition Algorithm Integrating Language Model
    Lü, Kun-Ru
    Wu, Chun-Guo
    Liang, Yan-Chun
    Yuan, Yu-Ping
    Ren, Zhi-Min
    Zhou, You
    Shi, Xiao-Hu
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2021, 49 (11): : 2177 - 2185
  • [23] Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation
    Fukuda, Ryo
    Sudoh, Katsuhito
    Nakamura, Satoshi
    INTERSPEECH 2022, 2022, : 121 - 125
  • [24] SYNCHRONOUS TRANSFORMERS FOR END-TO-END SPEECH RECOGNITION
    Tian, Zhengkun
    Yi, Jiangyan
    Bai, Ye
    Tao, Jianhua
    Zhang, Shuai
    Wen, Zhengqi
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7884 - 7888
  • [25] End-to-End Speech Recognition For Arabic Dialects
    Seham Nasr
    Rehab Duwairi
    Muhannad Quwaider
    Arabian Journal for Science and Engineering, 2023, 48 : 10617 - 10633
  • [26] PARAMETER UNCERTAINTY FOR END-TO-END SPEECH RECOGNITION
    Braun, Stefan
    Liu, Shih-Chii
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5636 - 5640
  • [27] END-TO-END VISUAL SPEECH RECOGNITION WITH LSTMS
    Petridis, Stavros
    Li, Zuwei
    Pantic, Maja
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2592 - 2596
  • [28] An End-to-End model for Vietnamese speech recognition
    Van Huy Nguyen
    2019 IEEE - RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF), 2019, : 307 - 312
  • [29] Review of End-to-End Streaming Speech Recognition
    Wang, Aohui
    Zhang, Long
    Song, Wenyu
    Meng, Jie
    Computer Engineering and Applications, 2024, 59 (02) : 22 - 33
  • [30] End-to-End Speech Recognition and Disfluency Removal
    Lou, Paria Jamshid
    Johnson, Mark
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2051 - 2061