Evolution of the performance of automatic speech recognition algorithms in transcribing conversational telephone speech

被引:0
|
作者
Padmanabhan, M [1 ]
Saon, G [1 ]
Zweig, G [1 ]
Huang, J [1 ]
Kingsbury, B [1 ]
Mangu, L [1 ]
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
来源
IMTC/2001: PROCEEDINGS OF THE 18TH IEEE INSTRUMENTATION AND MEASUREMENT TECHNOLOGY CONFERENCE, VOLS 1-3: REDISCOVERING MEASUREMENT IN THE AGE OF INFORMATICS | 2001年
关键词
speech recognition; spontaneous speech; telephone speech; discriminant transforms; boosting; consensus; formant frequencies; spectral peaks;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Research in the speech recognition speech-to-text conversion) area has been underway for a couple of decades, and a great deal of progress has been made in reducing the word error rate (WER). In this paper, we attempt to summarize the state of the art in speech recognition algorithms. The algorithms we describe span the areas of lexicon design, feature extraction, classifier design, combination of hypotheses, and speaker adaptation of acoustic models. We will benchmark the algorithms on two main sources of speech, the first being Voicemail (conversational telephone speech from a single speaker) and the second being Switchboard (conversational telephone speech between two speakers). We also present the results of some cross-domain experiments which highlight the "brittleness" of speech recognition systems today and illustrates the need to focus research effort on improving cross-domain performance.
引用
收藏
页码:1926 / 1931
页数:4
相关论文
共 50 条
  • [31] The AhoSR Automatic Speech Recognition System
    Odriozola, Igor
    Serrano, Luis
    Hernaez, Inma
    Navas, Eva
    ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2014, 2014, 8854 : 279 - 288
  • [32] EARS: Electromyographical automatic recognition of speech
    Jou, Szu-Chen Stan
    Schultz, Tanja
    BIOSIGNALS 2008: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON BIO-INSPIRED SYSTEMS AND SIGNAL PROCESSING, VOL 1, 2008, : 3 - +
  • [33] Automatic Recognition of Anger in Spontaneous Speech
    Neiberg, Daniel
    Elenius, Kjell
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2755 - 2758
  • [34] Automatic intelligibility assessment of pathologic speech over the telephone
    Haderlein, Tino
    Noeth, Elmar
    Batliner, Anton
    Eysholdt, Ulrich
    Rosanowski, Frank
    LOGOPEDICS PHONIATRICS VOCOLOGY, 2011, 36 (04) : 175 - 181
  • [35] Detecting breathing sounds in realistic Japanese telephone conversations and its application to automatic speech recognition
    Fukuda, Takashi
    Ichikawa, Osamu
    Nishimura, Masafumi
    SPEECH COMMUNICATION, 2018, 98 : 95 - 103
  • [36] Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition
    Ni, Junrui
    Wang, Liming
    Gao, Heting
    Qian, Kaizhi
    Zhang, Yang
    Chang, Shiyu
    Hasegawa-Johnson, Mark
    INTERSPEECH 2022, 2022, : 461 - 465
  • [37] Comparing Humans and Automatic Speech Recognition Systems in Recognizing Dysarthric Speech
    Mengistu, Kinfe Tadesse
    Rudzicz, Frank
    ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 6657 : 291 - 300
  • [38] A Study on Speech Coders for Automatic Speech Recognition in Adverse Communication Environments
    Choi, Seung Ho
    INFORMATICS ENGINEERING AND INFORMATION SCIENCE, PT II, 2011, 252 : 67 - 75
  • [39] Auditory driven subband speech enhancement for automatic recognition of noisy speech
    Upadhyay N.
    Rosales H.G.
    International Journal of Speech Technology, 2016, 19 (4) : 869 - 880
  • [40] Handling of true and pseudo wideband speech signals in automatic speech recognition
    Ben Salah, Mohamed-Ali
    Monne, Jean
    Jouvet, Denis
    Andre-Obrecht, Regine
    PROCEEDINGS OF THE 8TH WSEAS INTERNATIONAL CONFERENCE ON SIGNAL, SPEECH AND IMAGE PROCESSING (SSIP '08): SIGNAL, SPEECH AND IMAGE PROCESSING, 2008, : 39 - +