Evolution of the performance of automatic speech recognition algorithms in transcribing conversational telephone speech

被引:0
|
作者
Padmanabhan, M [1 ]
Saon, G [1 ]
Zweig, G [1 ]
Huang, J [1 ]
Kingsbury, B [1 ]
Mangu, L [1 ]
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
speech recognition; spontaneous speech; telephone speech; discriminant transforms; boosting; consensus; formant frequencies; spectral peaks;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Research in the speech recognition speech-to-text conversion) area has been underway for a couple of decades, and a great deal of progress has been made in reducing the word error rate (WER). In this paper, we attempt to summarize the state of the art in speech recognition algorithms. The algorithms we describe span the areas of lexicon design, feature extraction, classifier design, combination of hypotheses, and speaker adaptation of acoustic models. We will benchmark the algorithms on two main sources of speech, the first being Voicemail (conversational telephone speech from a single speaker) and the second being Switchboard (conversational telephone speech between two speakers). We also present the results of some cross-domain experiments which highlight the "brittleness" of speech recognition systems today and illustrates the need to focus research effort on improving cross-domain performance.
引用
收藏
页码:1926 / 1931
页数:4
相关论文
共 50 条
  • [1] Generating and evaluating segmentations for automatic speech recognition of conversational telephone speech
    Tranter, SE
    Yu, K
    Evermann, G
    Woodland, RC
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 753 - 756
  • [2] Conversational telephone speech recognition
    Gauvain, JL
    Lamel, L
    Schwenk, H
    Adda, G
    Chen, L
    Lefèvre, F
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 212 - 215
  • [3] Automatic transcription of conversational telephone speech
    Hain, T
    Woodland, PC
    Evermann, G
    Gales, MJF
    Liu, XY
    Moore, GL
    Povey, D
    Wang, L
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (06): : 1173 - 1185
  • [4] Improvements in recognition of conversational telephone speech
    Peskin, B
    Newman, M
    McAllaster, D
    Nagesha, V
    Richards, H
    Wegmann, S
    Hunt, M
    Gillick, L
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 53 - 56
  • [5] Conversational telephone speech recognition for Lithuanian
    Lileiyte, Rasa
    Lamel, Lori
    Guvain, Jean-Luc
    Gorin, Arseniy
    COMPUTER SPEECH AND LANGUAGE, 2018, 49 : 71 - 82
  • [6] Automatic Speech Recognition of Conversational Speech in Individuals With Disordered Speech
    Tobin, Jimmy
    Nelson, Phillip
    MacDonald, Bob
    Heywood, Rus
    Cave, Richard
    Seaver, Katie
    Desjardins, Antoine
    Jiang, Pan-Pan
    Green, Jordan R.
    JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2024, 67 (11): : 4176 - 4185
  • [7] Recognition of conversational telephone speech using the JANUS speech engine
    Zeppenfeld, T
    Finke, M
    Ries, K
    Westphal, M
    Waibel, A
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1815 - 1818
  • [8] Noise-Robust speech recognition of Conversational Telephone Speech
    Chen, Gang
    Tolba, Hesham
    O'Shaughnessy, Douglas
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1101 - 1104
  • [9] Trapping conversational speech: Extending trap/tandem approaches to conversational telephone speech recognition
    Morgan, N
    Chen, BY
    Zhu, QF
    Stolcke, A
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 537 - 540
  • [10] Improving English Conversational Telephone Speech Recognition
    Medennikov, Ivan
    Prudnikov, Alexey
    Zatvornitskiy, Alexander
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2 - 6