Evolution of the performance of automatic speech recognition algorithms in transcribing conversational telephone speech

被引:0
|
作者
Padmanabhan, M [1 ]
Saon, G [1 ]
Zweig, G [1 ]
Huang, J [1 ]
Kingsbury, B [1 ]
Mangu, L [1 ]
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
speech recognition; spontaneous speech; telephone speech; discriminant transforms; boosting; consensus; formant frequencies; spectral peaks;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Research in the speech recognition speech-to-text conversion) area has been underway for a couple of decades, and a great deal of progress has been made in reducing the word error rate (WER). In this paper, we attempt to summarize the state of the art in speech recognition algorithms. The algorithms we describe span the areas of lexicon design, feature extraction, classifier design, combination of hypotheses, and speaker adaptation of acoustic models. We will benchmark the algorithms on two main sources of speech, the first being Voicemail (conversational telephone speech from a single speaker) and the second being Switchboard (conversational telephone speech between two speakers). We also present the results of some cross-domain experiments which highlight the "brittleness" of speech recognition systems today and illustrates the need to focus research effort on improving cross-domain performance.
引用
收藏
页码:1926 / 1931
页数:4
相关论文
共 50 条
  • [31] Dialogue act modeling for automatic tagging and recognition of conversational speech
    Stolcke, A
    Ries, K
    Coccaro, N
    Shriberg, E
    Bates, R
    Jurafsky, D
    Taylor, P
    Martin, R
    Van Ess-Dykema, C
    Meteer, M
    COMPUTATIONAL LINGUISTICS, 2000, 26 (03) : 339 - 373
  • [32] A cross-channel modeling approach for automatic segmentation of conversational telephone speech
    Liu, DB
    Kubala, F
    ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 333 - 338
  • [33] DESIGN AND USE OF SPEECH RECOGNITION ALGORITHMS FOR A MOBILE RADIO TELEPHONE
    DOBLER, S
    GELLER, D
    HAEBUMBACH, R
    MEYER, P
    NEY, H
    RUEHL, HW
    SPEECH COMMUNICATION, 1993, 12 (03) : 221 - 229
  • [34] Channel normalization techniques for automatic speech recognition over the telephone
    de Veth, J
    Boves, L
    SPEECH COMMUNICATION, 1998, 25 (1-3) : 149 - 164
  • [35] UNSUPERVISED TRAINING OF SUBSPACE GAUSSIAN MIXTURE MODELS FOR CONVERSATIONAL TELEPHONE SPEECH RECOGNITION
    Ma, Zejun
    Wang, Xiaorui
    Xu, Bo
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4829 - 4832
  • [36] UNSUPERVISED TRAINING OF SUBSPACE GAUSSIAN MIXTURE MODELS FOR CONVERSATIONAL TELEPHONE SPEECH RECOGNITION
    Ma, Zejun
    Wang, Xiaorui
    Xu, Bo
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4829 - 4832
  • [37] Improved Single System Conversational Telephone Speech Recognition with VGG Bottleneck Features
    Hartmann, William
    Hsiao, Roger
    Ng, Tim
    Ma, Jeff
    Keith, Francis
    Siu, Man-Hung
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 112 - 116
  • [38] Spoken language recognition in conversational telephone speech and TV broadcast news (GLOSA)
    Javier Rodriguez-Fuentes, Luis
    Varona, Amparo
    Penagarikano, Mikel
    Diez, Mireia
    Bordel, German
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2011, (47): : 349 - 350
  • [39] Automatic Speech Recognition With Very Large Conversational Finnish and Estonian Vocabularies
    Enarvi, Seppo
    Smit, Peter
    Virpioja, Sami
    Kurimo, Mikko
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (11) : 2085 - 2097
  • [40] Speech production and automatic speech recognition
    Acoustics Bulletin, 2000, 25 (02):