Evolution of the performance of automatic speech recognition algorithms in transcribing conversational telephone speech

被引:0
|
作者
Padmanabhan, M [1 ]
Saon, G [1 ]
Zweig, G [1 ]
Huang, J [1 ]
Kingsbury, B [1 ]
Mangu, L [1 ]
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
来源
IMTC/2001: PROCEEDINGS OF THE 18TH IEEE INSTRUMENTATION AND MEASUREMENT TECHNOLOGY CONFERENCE, VOLS 1-3: REDISCOVERING MEASUREMENT IN THE AGE OF INFORMATICS | 2001年
关键词
speech recognition; spontaneous speech; telephone speech; discriminant transforms; boosting; consensus; formant frequencies; spectral peaks;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Research in the speech recognition speech-to-text conversion) area has been underway for a couple of decades, and a great deal of progress has been made in reducing the word error rate (WER). In this paper, we attempt to summarize the state of the art in speech recognition algorithms. The algorithms we describe span the areas of lexicon design, feature extraction, classifier design, combination of hypotheses, and speaker adaptation of acoustic models. We will benchmark the algorithms on two main sources of speech, the first being Voicemail (conversational telephone speech from a single speaker) and the second being Switchboard (conversational telephone speech between two speakers). We also present the results of some cross-domain experiments which highlight the "brittleness" of speech recognition systems today and illustrates the need to focus research effort on improving cross-domain performance.
引用
收藏
页码:1926 / 1931
页数:4
相关论文
共 50 条
  • [41] Robust telephone speech recognition based on channel compensation
    Han, JQ
    Gao, W
    PATTERN RECOGNITION, 1999, 32 (06) : 1061 - 1067
  • [42] Usage, performance, and satisfaction outcomes for experienced users of automatic speech recognition
    Koester, HH
    JOURNAL OF REHABILITATION RESEARCH AND DEVELOPMENT, 2004, 41 (05) : 739 - 754
  • [43] Model Adaptation for Automatic Speech Recognition Based on Multiple Time Scale Evolution
    Watanabe, Shinji
    Nakamura, Atsushi
    Juang, Biing-Hwang
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1088 - +
  • [44] A CONVERSATIONAL NEURAL LANGUAGE MODEL FOR SPEECH RECOGNITION IN DIGITAL ASSISTANTS
    Cho, Eunjoon
    Kumar, Shankar
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5784 - 5788
  • [45] Conversational Speech Recognition Needs Data? Experiments with Austrian German
    Linke, Julian
    Garner, Philip N.
    Kubin, Gernot
    Schuppler, Barbara
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 4684 - 4691
  • [46] THE ROYALFLUSH AUTOMATIC SPEECH DIARIZATION AND RECOGNITION SYSTEM FOR IN-CAR MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION CHALLENGE
    Tian, Jingguang
    Ye, Shuaishuai
    Chen, Shunfei
    Xiang, Yang
    Yin, Zhaohui
    Hu, Xinhui
    Xu, Xinkang
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 1 - 2
  • [47] A Speech Recognition-based Solution for the Automatic Detection of Mild Cognitive Impairment from Spontaneous Speech
    Toth, Laszlo
    Hoffmann, Ildiko
    Gosztolya, Gabor
    Vincze, Veronika
    Szatloczki, Greta
    Banreti, Zoltan
    Pakaski, Magdolna
    Kalman, Janos
    CURRENT ALZHEIMER RESEARCH, 2018, 15 (02) : 130 - 138
  • [48] Automatic Speech Recognition Used for Intelligibility Assessment of Text-to-Speech Systems
    Vich, Robert
    Nouza, Jan
    Vondra, Martin
    VERBAL AND NONVERBAL FEATURES OF HUMAN-HUMAN AND HUMAN-MACHINE INTERACTIONS, 2008, 5042 : 136 - +
  • [49] The Multi-level Approach to Speech Corpora Annotation for Automatic Speech Recognition
    Glavatskih, Igor
    Platonova, Tatyana
    Rogozhina, Valeria
    Shirokova, Anna
    Smolina, Anna
    Kotov, Mikhail
    Ovsyannikova, Anna
    Repalov, Sergey
    Zulkarneev, Mikhail
    SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 : 438 - 445
  • [50] Improving Automatic Speech Recognition and Speech Translation via Word Embedding Prediction
    Chuang, Shun-Po
    Liu, Alexander H.
    Sung, Tzu-Wei
    Lee, Hung-yi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 93 - 105