A New Corpus of Elderly Japanese Speech for Acoustic Modeling, and a Preliminary Investigation of Dialect-Dependent Speech Recognition

被引:0
作者
Fukuda, Meiko [1 ]
Nishimura, Ryota [1 ]
Nishizaki, Hiromitsu [2 ]
Iribe, Yurie [3 ]
Kitaoka, Norihide [4 ]
机构
[1] Tokushima Univ, Dept Comp Sci, Tokushima, Japan
[2] Univ Yamanashi, Fac Engn, Grad Sch Interdisciplinary Res, Kofu, Yamanashi, Japan
[3] Aichi Prefectural Univ, Sch Informat Sci & Technol, Nagakute, Aichi, Japan
[4] Toyohashi Univ Technol, Dept Comp Sci & Engn, Toyohashi, Aichi, Japan
来源
2019 22ND CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA) | 2019年
关键词
elderly; Japanese; corpus; speech recognition; adaptation; dialect; DEMENTIA;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We have constructed a new speech data corpus consisting of the utterances of 221 elderly Japanese people (average age: 79.2) with the aim of improving the accuracy of automatic speech recognition (ASR) for the elderly. ASR is a beneficial modality for people with impaired vision or limited hand movement, including the elderly. However, speech recognition systems using standard recognition models, especially acoustic models, have been unable to achieve satisfactory performance for the elderly. Thus, creating more accurate acoustic models of the speech of elderly users is essential for improving speech recognition for the elderly. Using our new corpus, which includes the speech of elderly people living in three regions of Japan, we conducted speech recognition experiments using a variety of DNN-HNN acoustic models. As training data for our acoustic models, we examined whether a standard adult Japanese speech corpus (JNAS), an elderly speech corpus (S-JNAS) or a spontaneous speech corpus (CSJ) was most suitable, and whether or not adaptation to the dialect of each region improved recognition results. We adapted each of our three acoustic models to all of our speech data, and then re-adapt them using speech from each region. Without adaptation, the best recognition results were obtained when using the S-JNAS trained acoustic models (total corpus: 21.85% Word Error Rate). However, after adaptation of our acoustic models to our entire corpus, the CSJ trained models achieved the lowest WERs (entire corpus: 17.42%). Moreover, after readaptation to each regional dialect, the CSJ trained acoustic model with adaptation to regional speech data showed tendencies of improved recognition rates. We plan to collect more utterances from all over Japan, so that our corpus can be used as a key resource for elderly speech recognition in Japanese. We also hope to achieve further improvement in recognition performance for elderly speech.
引用
收藏
页码:78 / 83
页数:6
相关论文
共 50 条
  • [41] INVESTIGATION OF DEEP NEURAL NETWORKS (DNN) FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION: WHY DNN SURPASSES GMMS IN ACOUSTIC MODELING
    Pan, Jia
    Liu, Cong
    Wang, Zhiguo
    Hu, Yu
    Jiang, Hui
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 301 - 305
  • [42] End-to-End Speech Endpoint Detection Utilizing Acoustic and Language Modeling Knowledge for Online Low-Latency Speech Recognition
    Hwang, Inyoung
    Chang, Joon-Hyuk
    IEEE ACCESS, 2020, 8 : 161109 - 161123
  • [43] The Broadcast Narrow Band Speech Corpus: A New Resource Type for Large Scale Language Recognition
    Cieri, Christopher
    Brandschain, Linda
    Neely, Abby
    Graff, David
    Walker, Kevin
    Caruso, Chris
    Martin, Alvin
    Greenberg, Craig
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2819 - +
  • [44] Probabilistic Speaker-Class based Acoustic Modeling for Large Vocabulary Continuous Speech Recognition
    Li, Xiangang
    Su, Dan
    Pang, Zaihu
    Wu, Xihong
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1218 - 1221
  • [45] Large Vocabulary Children's Speech Recognition with DNN-HMM and SGMM Acoustic Modeling
    Giuliani, Diego
    BabaAli, Bagher
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1635 - 1639
  • [46] A Comparative Study on Selecting Acoustic Modeling Units for WFST-based Mongolian Speech Recognition
    Wang Yonghe
    Bao, Feilong
    Gao, Gaunglai
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (10)
  • [47] Multilingual recognition of non-native speech using acoustic model transformation and pronunciation modeling
    Bouselmi, G.
    Fohr, D.
    Illina, I.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2012, 15 (02) : 203 - 213
  • [48] Bridging the Gap Between Monaural Speech Enhancement and Recognition With Distortion-Independent Acoustic Modeling
    Wang, Peidong
    Tan, Ke
    Wang, De Liang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 39 - 48
  • [49] Improved Recognition of Spontaneous Hungarian Speech-Morphological and Acoustic Modeling Techniques for a Less Resourced Task
    Peter Mihajlik
    Zoltan Tueske
    Balazs Tarjan
    Bottyan Nemeth
    Tibor Fegyo
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1588 - 1600
  • [50] ACOUSTIC MODELING FOR DISTANT MULTI-TALKER SPEECH RECOGNITION WITH SINGLE- AND MULTI-CHANNEL BRANCHES
    Kanda, Naoyuki
    Fujita, Yusuke
    Horiguchi, Shota
    Ikeshita, Rintaro
    Nagamatsu, Kenji
    Watanabe, Shinji
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6630 - 6634