A New Corpus of Elderly Japanese Speech for Acoustic Modeling, and a Preliminary Investigation of Dialect-Dependent Speech Recognition

被引:0
|
作者
Fukuda, Meiko [1 ]
Nishimura, Ryota [1 ]
Nishizaki, Hiromitsu [2 ]
Iribe, Yurie [3 ]
Kitaoka, Norihide [4 ]
机构
[1] Tokushima Univ, Dept Comp Sci, Tokushima, Japan
[2] Univ Yamanashi, Fac Engn, Grad Sch Interdisciplinary Res, Kofu, Yamanashi, Japan
[3] Aichi Prefectural Univ, Sch Informat Sci & Technol, Nagakute, Aichi, Japan
[4] Toyohashi Univ Technol, Dept Comp Sci & Engn, Toyohashi, Aichi, Japan
来源
2019 22ND CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA) | 2019年
关键词
elderly; Japanese; corpus; speech recognition; adaptation; dialect; DEMENTIA;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We have constructed a new speech data corpus consisting of the utterances of 221 elderly Japanese people (average age: 79.2) with the aim of improving the accuracy of automatic speech recognition (ASR) for the elderly. ASR is a beneficial modality for people with impaired vision or limited hand movement, including the elderly. However, speech recognition systems using standard recognition models, especially acoustic models, have been unable to achieve satisfactory performance for the elderly. Thus, creating more accurate acoustic models of the speech of elderly users is essential for improving speech recognition for the elderly. Using our new corpus, which includes the speech of elderly people living in three regions of Japan, we conducted speech recognition experiments using a variety of DNN-HNN acoustic models. As training data for our acoustic models, we examined whether a standard adult Japanese speech corpus (JNAS), an elderly speech corpus (S-JNAS) or a spontaneous speech corpus (CSJ) was most suitable, and whether or not adaptation to the dialect of each region improved recognition results. We adapted each of our three acoustic models to all of our speech data, and then re-adapt them using speech from each region. Without adaptation, the best recognition results were obtained when using the S-JNAS trained acoustic models (total corpus: 21.85% Word Error Rate). However, after adaptation of our acoustic models to our entire corpus, the CSJ trained models achieved the lowest WERs (entire corpus: 17.42%). Moreover, after readaptation to each regional dialect, the CSJ trained acoustic model with adaptation to regional speech data showed tendencies of improved recognition rates. We plan to collect more utterances from all over Japan, so that our corpus can be used as a key resource for elderly speech recognition in Japanese. We also hope to achieve further improvement in recognition performance for elderly speech.
引用
收藏
页码:78 / 83
页数:6
相关论文
共 50 条
  • [1] Improving Speech Recognition for the Elderly: A New Corpus of Elderly Japanese Speech and Investigation of Acoustic Modeling for Speech Recognition
    Fukuda, Meiko
    Nishizaki, Hiromitsu
    Iribe, Yurie
    Nishimura, Ryota
    Kitaoka, Norihide
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6578 - 6585
  • [2] Construction of a Corpus for Elderly Japanese Speech Recognition
    Fukuda, Meiko
    Nishimura, Ryota
    Kitaoka, Norihide
    Nishizaki, Hiromitsu
    Iribe, Yurie
    2018 IEEE 7TH GLOBAL CONFERENCE ON CONSUMER ELECTRONICS (GCCE 2018), 2018, : 687 - 688
  • [3] A new speech corpus of super-elderly Japanese for acoustic modeling
    Fukuda, Meiko
    Nishimura, Ryota
    Nishizaki, Hiromitsu
    Horii, Koharu
    Iribe, Yurie
    Yamamoto, Kazumasa
    Kitaoka, Norihide
    COMPUTER SPEECH AND LANGUAGE, 2023, 77
  • [4] DEVELOPMENT OF NEW SPEECH CORPUS FOR ELDERLY JAPANESE SPEECH RECOGNITION
    Iribe, Yurie
    Kitaoka, Norihide
    Segawa, Shuhei
    2015 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2015 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2015, : 27 - 31
  • [5] Satja: Thai Elderly Speech Corpus for Speech Recognition
    Prajongjai, Suphunnee
    Triyason, Tuul
    Mongkolnam, Pornchai
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON ADVANCES IN INFORMATION TECHNOLOGY (IAIT2018), 2018,
  • [6] Thai Dialect Corpus and Transfer-based Curriculum Learning Investigation for Dialect Automatic Speech Recognition
    Suwanbandit, Artit
    Naowarat, Burin
    Sangpetch, Orathai
    Chuangsuwanich, Ekapol
    INTERSPEECH 2023, 2023, : 4069 - 4073
  • [7] Speech corpus recycling for acoustic cross-domain environments for automatic speech recognition
    Ichikawa, Osamu
    Rennie, Steven J.
    Fukuda, Takashi
    Willett, Daniel
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2016, 37 (02) : 55 - 65
  • [8] MinSpeech: A Corpus of Southern Min Dialect for Automatic Speech Recognition
    Lin, Jiayan
    Lu, Shenghui
    Huang, Hukai
    Guan, Wenhao
    Xu, Binbin
    Bu, Hui
    Hong, Qingyang
    Li, Lin
    INTERSPEECH 2024, 2024, : 2330 - 2334
  • [9] Arabic Speech Emotion Recognition From Saudi Dialect Corpus
    Aljuhani, Reem Hamed
    Alshutayri, Areej
    Alahdal, Shahd
    IEEE ACCESS, 2021, 9 : 127081 - 127085
  • [10] Prosody-dependent Acoustic Modeling for Mandarin Speech Recognition
    Chiu, Tzu-Hsuan
    Chiang, Chen-Yu
    Liao, Yuan-Fu
    Yang, Jyh-Her
    Wang, Yih-Ru
    Chen, Sin-Horng
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON SPEECH PROSODY, VOLS I AND II, 2012, : 139 - 142