Evolution of the performance of automatic speech recognition algorithms in transcribing conversational telephone speech

被引:0
|
作者
Padmanabhan, M [1 ]
Saon, G [1 ]
Zweig, G [1 ]
Huang, J [1 ]
Kingsbury, B [1 ]
Mangu, L [1 ]
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
来源
IMTC/2001: PROCEEDINGS OF THE 18TH IEEE INSTRUMENTATION AND MEASUREMENT TECHNOLOGY CONFERENCE, VOLS 1-3: REDISCOVERING MEASUREMENT IN THE AGE OF INFORMATICS | 2001年
关键词
speech recognition; spontaneous speech; telephone speech; discriminant transforms; boosting; consensus; formant frequencies; spectral peaks;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Research in the speech recognition speech-to-text conversion) area has been underway for a couple of decades, and a great deal of progress has been made in reducing the word error rate (WER). In this paper, we attempt to summarize the state of the art in speech recognition algorithms. The algorithms we describe span the areas of lexicon design, feature extraction, classifier design, combination of hypotheses, and speaker adaptation of acoustic models. We will benchmark the algorithms on two main sources of speech, the first being Voicemail (conversational telephone speech from a single speaker) and the second being Switchboard (conversational telephone speech between two speakers). We also present the results of some cross-domain experiments which highlight the "brittleness" of speech recognition systems today and illustrates the need to focus research effort on improving cross-domain performance.
引用
收藏
页码:1926 / 1931
页数:4
相关论文
共 50 条
  • [1] Noise-Robust speech recognition of Conversational Telephone Speech
    Chen, Gang
    Tolba, Hesham
    O'Shaughnessy, Douglas
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1101 - 1104
  • [2] Performance Analysis of Various Single Channel Speech Enhancement Algorithms for Automatic Speech Recognition
    Song, Myung-Suk
    Lee, Chang-Heon
    Kang, Hong-Goo
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1451 - 1454
  • [3] Automatic speech recognition services in common telephone network
    Karpov, A
    Ronzhin, A
    Proceedings of the Second IASTED International Multi-Conference on Automation, Control, and Information Technology - Signal and Image Processing, 2005, : 220 - 225
  • [4] Channel normalization techniques for automatic speech recognition over the telephone
    de Veth, J
    Boves, L
    SPEECH COMMUNICATION, 1998, 25 (1-3) : 149 - 164
  • [5] Discriminative Approach to Build Hybrid Vocabulary for Conversational Telephone Speech Recognition of Agglutinative Languages
    Li, Xin
    Pan, Jielin
    Zhao, Qingwei
    Yan, Yonghong
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (11): : 2478 - 2482
  • [6] On the limit of English conversational speech recognition
    Tuske, Zoltan
    Saon, George
    Kingsbury, Brian
    INTERSPEECH 2021, 2021, : 2062 - 2066
  • [7] SOME INSIGHTS FROM TRANSLATING CONVERSATIONAL TELEPHONE SPEECH
    Kumar, Gaurav
    Post, Matt
    Povey, Daniel
    Khudanpur, Sanjeev
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [8] Robust speech detection method for telephone speech recognition system
    Kuroiwa, S
    Naito, M
    Yamamoto, S
    Higuchi, N
    SPEECH COMMUNICATION, 1999, 27 (02) : 135 - 148
  • [9] Automatic speech recognition and speech variability: A review
    Benzeghiba, M.
    De Mori, R.
    Deroo, O.
    Dupont, S.
    Erbes, T.
    Jouvet, D.
    Fissore, L.
    Laface, P.
    Mertins, A.
    Ris, C.
    Rose, R.
    Tyagi, V.
    Wellekens, C.
    SPEECH COMMUNICATION, 2007, 49 (10-11) : 763 - 786
  • [10] AUTOMATIC RECOGNITION OF WIDEBAND TELEPHONE SPEECH WITH LIMITED AMOUNT OF MATCHED TRAINING DATA
    Bauer, Patrick
    Abel, Johannes
    Fischer, Volker
    Fingscheidt, Tim
    2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 1232 - 1236