Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for Ainu Language

被引：0

作者：

Matsuura, Kohei ^{[1
]}

Ueno, Sei ^{[1
]}

Mimura, Masato ^{[1
]}

Sakai, Shinsuke ^{[1
]}

Kawahara, Tatsuya ^{[1
]}

机构：

[1] Kyoto Univ, Grad Sch Informat, Sakyo Ku, Kyoto 6068501, Japan

来源：

PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020) | 2020年

关键词：

Ainu speech corpus; low-resource language; end-to-end speech recognition; JAPANESE;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Ainu is an unwritten language that has been spoken by Ainu people who are one of the ethnic groups in Japan. It is recognized as critically endangered by UNESCO and archiving and documentation of its language heritage is of paramount importance. Although a considerable amount of voice recordings of Ainu folklore has been produced and accumulated to save their culture, only a quite limited parts of them are transcribed so far. Thus, we started a project of automatic speech recognition (ASR) for the Ainu language in order to contribute to the development of annotated language archives. In this paper, we report speech corpus development and the structure and performance of end-to-end ASR for Ainu. We investigated four modeling units (phone, syllable, word piece, and word) and found that the syllable-based model performed best in terms of both word and phone recognition accuracy, which were about 60% and over 85% respectively in speaker-open condition. Furthermore, word and phone accuracy of 80% and 90% has been achieved in a speaker-closed setting. We also found out that a multilingual ASR training with additional speech corpora of English and Japanese further improves the speaker-open test accuracy.

引用

页码：2622 / 2628

页数：7

共 50 条

[21] End-to-End Multilingual Speech Recognition System with Language Supervision Training
Liu, Danyang
Xu, Ji
Zhang, Pengyuan
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (06): : 1427 - 1430
[22] An End-to-End Chinese Speech Recognition Algorithm Integrating Language Model
Lü, Kun-Ru
Wu, Chun-Guo
Liang, Yan-Chun
Yuan, Yu-Ping
Ren, Zhi-Min
Zhou, You
Shi, Xiao-Hu
Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2021, 49 (11): : 2177 - 2185
[23] Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation
Fukuda, Ryo
Sudoh, Katsuhito
Nakamura, Satoshi
INTERSPEECH 2022, 2022, : 121 - 125
[24] SYNCHRONOUS TRANSFORMERS FOR END-TO-END SPEECH RECOGNITION
Tian, Zhengkun
Yi, Jiangyan
Bai, Ye
Tao, Jianhua
Zhang, Shuai
Wen, Zhengqi
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7884 - 7888
[25] End-to-End Speech Recognition For Arabic Dialects
Seham Nasr
Rehab Duwairi
Muhannad Quwaider
Arabian Journal for Science and Engineering, 2023, 48 : 10617 - 10633
[26] PARAMETER UNCERTAINTY FOR END-TO-END SPEECH RECOGNITION
Braun, Stefan
Liu, Shih-Chii
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5636 - 5640
[27] END-TO-END VISUAL SPEECH RECOGNITION WITH LSTMS
Petridis, Stavros
Li, Zuwei
Pantic, Maja
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2592 - 2596
[28] An End-to-End model for Vietnamese speech recognition
Van Huy Nguyen
2019 IEEE - RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF), 2019, : 307 - 312
[29] Review of End-to-End Streaming Speech Recognition
Wang, Aohui
Zhang, Long
Song, Wenyu
Meng, Jie
Computer Engineering and Applications, 2024, 59 (02) : 22 - 33
[30] End-to-End Speech Recognition and Disfluency Removal
Lou, Paria Jamshid
Johnson, Mark
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2051 - 2061

← 1 2 3 4 5 →