Evolution of the performance of automatic speech recognition algorithms in transcribing conversational telephone speech

被引:0
|
作者
Padmanabhan, M [1 ]
Saon, G [1 ]
Zweig, G [1 ]
Huang, J [1 ]
Kingsbury, B [1 ]
Mangu, L [1 ]
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
speech recognition; spontaneous speech; telephone speech; discriminant transforms; boosting; consensus; formant frequencies; spectral peaks;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Research in the speech recognition speech-to-text conversion) area has been underway for a couple of decades, and a great deal of progress has been made in reducing the word error rate (WER). In this paper, we attempt to summarize the state of the art in speech recognition algorithms. The algorithms we describe span the areas of lexicon design, feature extraction, classifier design, combination of hypotheses, and speaker adaptation of acoustic models. We will benchmark the algorithms on two main sources of speech, the first being Voicemail (conversational telephone speech from a single speaker) and the second being Switchboard (conversational telephone speech between two speakers). We also present the results of some cross-domain experiments which highlight the "brittleness" of speech recognition systems today and illustrates the need to focus research effort on improving cross-domain performance.
引用
收藏
页码:1926 / 1931
页数:4
相关论文
共 50 条
  • [21] Chameleon: A Language Model Adaptation Toolkit for Automatic Speech Recognition of Conversational Speech
    Song, Yuanfeng
    Jiang, Di
    Zhao, Weiwei
    Xu, Qian
    Wong, Raymond Chi-Wing
    Yang, Qiang
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, 2019, : 37 - 42
  • [22] Speech recognition on Mandarin Call Home: A large-vocabulary, conversational, and telephone speech corpus
    Liu, FH
    Picheny, M
    Srinivasa, P
    Monkowski, M
    Chen, JL
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 157 - 160
  • [23] Automatic Speech Recognition Performance for Training on Noised Speech
    Prodeus, Arkadiy
    Kukharicheva, Kateryna
    2017 2ND IEEE INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION AND COMMUNICATION TECHNOLOGIES-2017 (AICT 2017), 2017, : 71 - 74
  • [24] Automatic speech recognition services in common telephone network
    Karpov, A
    Ronzhin, A
    Proceedings of the Second IASTED International Multi-Conference on Automation, Control, and Information Technology - Signal and Image Processing, 2005, : 220 - 225
  • [25] Progress in recognizing conversational telephone speech
    Peskin, B
    Gillick, L
    Liberman, N
    Newman, M
    vanMulbregt, P
    Wegmann, S
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1811 - 1814
  • [26] Speaker normalization on conversational telephone speech
    Wegmann, S
    McAllaster, D
    Orloff, J
    Peskin, B
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 339 - 341
  • [27] Mandarin telephone: Speech recognition for automatic telephone number directory service
    Wang, YR
    Chen, SH
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 841 - 844
  • [28] Real context model for tone recognition in mandarin conversational telephone speech
    Liu, Zhaojie
    Shao, Jian
    Zhang, Pengyuan
    Zhao, Qingwei
    Yan, Yonghong
    Feng, Ji
    ICNC 2007: THIRD INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 2, PROCEEDINGS, 2007, : 696 - +
  • [29] PERFORMANCE OF HARPY SPEECH RECOGNITION SYSTEM FOR TELEPHONE QUALITY SPEECH INPUT
    YEGNANARAYANA, B
    REDDY, DR
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 63 : S78 - S78
  • [30] Feature Selection Algorithms for Automatic Speech Recognition
    Kalamani, M.
    Valarmathy, S.
    Poonkuzhali, C.
    Catherine, J. N.
    2014 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2014,