A Performance Comparison of Commercial Speech Recognition APIs in Noisy Environments

被引:0
|
作者
Lee G. [2 ]
Lee S. [2 ]
Ji S. [3 ]
Kim A. [1 ,3 ]
Im H. [1 ,3 ]
机构
[1] Dept. of Computer Science and Engineering, Dept. of Convergence Security, Interdisciplinary Graduate Program in Medical Bigdata Convergence, Kangwon National University
[2] Dept. of Convergence Security, Kangwon National University
[3] Interdisciplinary Graduate Program in Medical Bigdata Convergence, Kangwon National University
基金
新加坡国家研究基金会;
关键词
Character error rate; Noisy environment; Speech recognition; Word error rate;
D O I
10.5370/KIEE.2022.71.9.1266
中图分类号
学科分类号
摘要
This paper compares the performance of five commercial speech recognition APIs under noisy environments, namely those provided by Amazon AWS, Microsoft Azure, Google, Kakao, and Naver. To this end, we used an open dataset for development and evaluation of multi-channel noise processing technology provided in AI Hub. We tested each API's performance with respect to the speaker's gender and location and the speech content, and measured their error rate using both word error rate (WER) and character error rate (CER). Except for the AWS API, the error rate was higher when tested with female's data than male's one, and when tested with the data recorded from the side than the front. The error rate was also relatively high when the test sentences contained proper nouns such as person's names and local names, and the shorter the sentences, the higher the error rate. Moreover, the Google API outperformed all the others in terms of both WER and CER, with 53% and 18% of error rate, respectively. © 2022 Korean Institute of Electrical Engineers. All rights reserved.
引用
收藏
页码:1266 / 1273
页数:7
相关论文
共 50 条
  • [1] SPEECH RECOGNITION IN NOISY ENVIRONMENTS - A SURVEY
    GONG, YF
    SPEECH COMMUNICATION, 1995, 16 (03) : 261 - 291
  • [2] A performance comparison of robust speech analysis methods in noisy environments
    Shimamura, T
    PROCEEDINGS OF 2001 INTERNATIONAL SYMPOSIUM ON INTELLIGENT MULTIMEDIA, VIDEO AND SPEECH PROCESSING, 2001, : 103 - 106
  • [3] Speech enhancement applied to speech recognition in noisy environments
    Xu, Y.F., 2001, Press of Tsinghua University (41):
  • [4] Speech Emotion Recognition in Noisy and Reverberant Environments
    Heracleous, Panikos
    Yasuda, Keiji
    Sugaya, Fumiaki
    Yoneyama, Akio
    Hashimoto, Masayuki
    2017 SEVENTH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2017, : 262 - 266
  • [5] Multisensory benefits for speech recognition in noisy environments
    Oh, Yonghee
    Schwalm, Meg
    Kalpin, Nicole
    FRONTIERS IN NEUROSCIENCE, 2022, 16
  • [6] Speech Recognition On Mobile Devices In Noisy Environments
    Yurtcan, Yaser
    Kilic, Banu Gunel
    2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [7] PERFORMANCE MONITORING FOR AUTOMATIC SPEECH RECOGNITION IN NOISY MULTI-CHANNEL ENVIRONMENTS
    Meyerl, Bernd T.
    Mallidi, Sri Harish
    Martinez, Angel Mario Castro
    Paya-Vaya, Guillermo
    Kayser, Hendrik
    Hermansky, Hynek
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 50 - 56
  • [8] Improving Speech Recognition Performance in Noisy Environments by Enhancing Lip Reading Accuracy
    Li, Dengshi
    Gao, Yu
    Zhu, Chenyi
    Wang, Qianrui
    Wang, Ruoxi
    SENSORS, 2023, 23 (04)
  • [9] Speech recognition in noisy environments with Convolutional Neural Networks
    Santos, Rafael M.
    Matos, Leonardo N.
    Macedo, Hendrik T.
    Montalvao, Jugurta
    2015 BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2015), 2015, : 175 - 179
  • [10] Perceptual features for automatic speech recognition in noisy environments
    Haque, Serajul
    Togneri, Roberto
    Zaknich, Anthony
    SPEECH COMMUNICATION, 2009, 51 (01) : 58 - 75