A New Corpus of Elderly Japanese Speech for Acoustic Modeling, and a Preliminary Investigation of Dialect-Dependent Speech Recognition

被引:0
|
作者
Fukuda, Meiko [1 ]
Nishimura, Ryota [1 ]
Nishizaki, Hiromitsu [2 ]
Iribe, Yurie [3 ]
Kitaoka, Norihide [4 ]
机构
[1] Tokushima Univ, Dept Comp Sci, Tokushima, Japan
[2] Univ Yamanashi, Fac Engn, Grad Sch Interdisciplinary Res, Kofu, Yamanashi, Japan
[3] Aichi Prefectural Univ, Sch Informat Sci & Technol, Nagakute, Aichi, Japan
[4] Toyohashi Univ Technol, Dept Comp Sci & Engn, Toyohashi, Aichi, Japan
来源
2019 22ND CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA) | 2019年
关键词
elderly; Japanese; corpus; speech recognition; adaptation; dialect; DEMENTIA;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We have constructed a new speech data corpus consisting of the utterances of 221 elderly Japanese people (average age: 79.2) with the aim of improving the accuracy of automatic speech recognition (ASR) for the elderly. ASR is a beneficial modality for people with impaired vision or limited hand movement, including the elderly. However, speech recognition systems using standard recognition models, especially acoustic models, have been unable to achieve satisfactory performance for the elderly. Thus, creating more accurate acoustic models of the speech of elderly users is essential for improving speech recognition for the elderly. Using our new corpus, which includes the speech of elderly people living in three regions of Japan, we conducted speech recognition experiments using a variety of DNN-HNN acoustic models. As training data for our acoustic models, we examined whether a standard adult Japanese speech corpus (JNAS), an elderly speech corpus (S-JNAS) or a spontaneous speech corpus (CSJ) was most suitable, and whether or not adaptation to the dialect of each region improved recognition results. We adapted each of our three acoustic models to all of our speech data, and then re-adapt them using speech from each region. Without adaptation, the best recognition results were obtained when using the S-JNAS trained acoustic models (total corpus: 21.85% Word Error Rate). However, after adaptation of our acoustic models to our entire corpus, the CSJ trained models achieved the lowest WERs (entire corpus: 17.42%). Moreover, after readaptation to each regional dialect, the CSJ trained acoustic model with adaptation to regional speech data showed tendencies of improved recognition rates. We plan to collect more utterances from all over Japan, so that our corpus can be used as a key resource for elderly speech recognition in Japanese. We also hope to achieve further improvement in recognition performance for elderly speech.
引用
收藏
页码:78 / 83
页数:6
相关论文
共 50 条
  • [21] Cross-Lingual Acoustic modeling for Dialectal Arabic Speech Recognition
    Elmahdy, Mohamed
    Gruhn, Rainer
    Minker, Wolfgang
    Abdennadher, Slim
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 873 - +
  • [22] CYCLEGAN BANDWIDTH EXTENSION ACOUSTIC MODELING FOR AUTOMATIC SPEECH RECOGNITION
    Haws, David
    Cui, Xiaodong
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6780 - 6784
  • [23] Acoustic Modeling in Mandarin Speech Recognition of Minority Accent in Yunnan
    Wu Peishan
    Yang Jian
    PROCEEDINGS OF THE 27TH CHINESE CONTROL CONFERENCE, VOL 4, 2008, : 526 - 530
  • [24] Acoustic models of the elderly for large-vocabulary continuous speech recognition
    Baba, A
    Yoshizawa, S
    Yamada, M
    Lee, A
    Shikano, K
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART II-ELECTRONICS, 2004, 87 (07): : 49 - 57
  • [25] Exploring corpus-invariant emotional acoustic feature for cross-corpus speech emotion recognition
    Lian, Hailun
    Lu, Cheng
    Zhao, Yan
    Li, Sunan
    Qi, Tianhua
    Zong, Yuan
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 258
  • [26] Dynamic Layer Normalization for Adaptive Neural Acoustic Modeling in Speech Recognition
    Kim, Taesup
    Song, Inchul
    Bengio, Yoshua
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2411 - 2415
  • [27] AN EXPLORATION OF DIRECTLY USING WORD AS ACOUSTIC MODELING UNIT FOR SPEECH RECOGNITION
    Zhang, Chunlei
    Yu, Chengzhu
    Weng, Chao
    Cui, Jia
    Yu, Dong
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 64 - 69
  • [28] Acoustic Modeling using Deep Belief Network for Bangla Speech Recognition
    Ahmed, Mahtab
    Shill, Pintu Chandra
    Islam, Kaidul
    Mollah, Md. Abdus Salim
    Akhand, M. A. H.
    2015 18TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2015, : 306 - 311
  • [29] Acoustic modeling for speech recognition based on a generalized Laplacian mixture distribution
    Nakamura, A
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART II-ELECTRONICS, 2002, 85 (11): : 32 - 42
  • [30] Acoustic Modeling with Densely Connected Residual Network for Multichannel Speech Recognition
    Tang, Jian
    Song, Yan
    Dai, LiRong
    McLoughlin, Ian
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1783 - 1787