A Performance Comparison of Commercial Speech Recognition APIs in Noisy Environments

被引:0
|
作者
Lee G. [2 ]
Lee S. [2 ]
Ji S. [3 ]
Kim A. [1 ,3 ]
Im H. [1 ,3 ]
机构
[1] Dept. of Computer Science and Engineering, Dept. of Convergence Security, Interdisciplinary Graduate Program in Medical Bigdata Convergence, Kangwon National University
[2] Dept. of Convergence Security, Kangwon National University
[3] Interdisciplinary Graduate Program in Medical Bigdata Convergence, Kangwon National University
基金
新加坡国家研究基金会;
关键词
Character error rate; Noisy environment; Speech recognition; Word error rate;
D O I
10.5370/KIEE.2022.71.9.1266
中图分类号
学科分类号
摘要
This paper compares the performance of five commercial speech recognition APIs under noisy environments, namely those provided by Amazon AWS, Microsoft Azure, Google, Kakao, and Naver. To this end, we used an open dataset for development and evaluation of multi-channel noise processing technology provided in AI Hub. We tested each API's performance with respect to the speaker's gender and location and the speech content, and measured their error rate using both word error rate (WER) and character error rate (CER). Except for the AWS API, the error rate was higher when tested with female's data than male's one, and when tested with the data recorded from the side than the front. The error rate was also relatively high when the test sentences contained proper nouns such as person's names and local names, and the shorter the sentences, the higher the error rate. Moreover, the Google API outperformed all the others in terms of both WER and CER, with 53% and 18% of error rate, respectively. © 2022 Korean Institute of Electrical Engineers. All rights reserved.
引用
收藏
页码:1266 / 1273
页数:7
相关论文
共 50 条
  • [21] A robust feature extraction for automatic speech recognition in noisy environments
    Lima, C
    Almeida, LB
    Monteiro, JL
    2002 6TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I AND II, 2002, : 540 - 543
  • [22] Audio-Visual Speech Recognition in Noisy Audio Environments
    Palecek, Karel
    Chaloupka, Josef
    2013 36TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2013, : 484 - 487
  • [23] Linearized distortion model for robust speech recognition in noisy environments
    He, Yong-Jun
    Han, Ji-Qing
    Tongxin Xuebao/Journal on Communications, 2010, 31 (09): : 8 - 14
  • [24] An experimental framework for Arabic digits speech recognition in noisy environments
    Touazi A.
    Debyeche M.
    International Journal of Speech Technology, 2017, 20 (2) : 205 - 224
  • [25] Robust Feature Extraction Methods for Speech Recognition in Noisy Environments
    Mukheolkar, Ajinkya Sunil
    Alex, John Sahaya Rani
    2014 FIRST INTERNATIONAL CONFERENCE ON NETWORKS & SOFT COMPUTING (ICNSC), 2014, : 295 - 299
  • [26] Improvement of the speech recognition in noisy environments using a nonparametric regression
    Amrouche, A.
    Taleb-Ahmed, A.
    Rouvaen, J. M.
    Yagoub, M. C. E.
    INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2009, 24 (01) : 49 - 67
  • [27] A comparative study for Arabic speech recognition system in noisy environments
    Abdelkbir Ouisaadane
    Said Safi
    International Journal of Speech Technology, 2021, 24 : 761 - 770
  • [28] Optimal Automatic Speech Recognition System Selection for Noisy Environments
    Tachioka, Yuuki
    Narita, Tomohiro
    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [29] Performance of a wavelet-based frontend under typical noisy environments for continuous speech recognition
    Sujatha, J
    Kumar, KRP
    Ramakrishnan, KR
    Balakrishnan, N
    PROCEEDINGS OF THE 6TH JOINT CONFERENCE ON INFORMATION SCIENCES, 2002, : 196 - 199
  • [30] Estimation of speech recognition performance in noisy and reverberant environments using PESQ score and acoustic parameters
    Fukumori, Takahiro
    Nakayama, Masato
    Nishiura, Takanobu
    Yamashita, Yoichi
    2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,