Evolution of the performance of automatic speech recognition algorithms in transcribing conversational telephone speech

被引:0
|
作者
Padmanabhan, M [1 ]
Saon, G [1 ]
Zweig, G [1 ]
Huang, J [1 ]
Kingsbury, B [1 ]
Mangu, L [1 ]
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
speech recognition; spontaneous speech; telephone speech; discriminant transforms; boosting; consensus; formant frequencies; spectral peaks;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Research in the speech recognition speech-to-text conversion) area has been underway for a couple of decades, and a great deal of progress has been made in reducing the word error rate (WER). In this paper, we attempt to summarize the state of the art in speech recognition algorithms. The algorithms we describe span the areas of lexicon design, feature extraction, classifier design, combination of hypotheses, and speaker adaptation of acoustic models. We will benchmark the algorithms on two main sources of speech, the first being Voicemail (conversational telephone speech from a single speaker) and the second being Switchboard (conversational telephone speech between two speakers). We also present the results of some cross-domain experiments which highlight the "brittleness" of speech recognition systems today and illustrates the need to focus research effort on improving cross-domain performance.
引用
收藏
页码:1926 / 1931
页数:4
相关论文
共 50 条
  • [41] AUTOMATIC SPEECH RECOGNITION OF IMPAIRED SPEECH
    CARLSON, GS
    BERNSTEIN, J
    INTERNATIONAL JOURNAL OF REHABILITATION RESEARCH, 1988, 11 (04) : 396 - 398
  • [42] Automatic linguistic segmentation of conversational speech
    Stolcke, A
    Shriberg, E
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1005 - 1008
  • [43] Performance Analysis and Optimization of Automatic Speech Recognition
    Tabani, Hamid
    Arnau, Jose-Maria
    Tubella, Jordi
    Gonzalez, Antonio
    IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS, 2018, 4 (04): : 847 - 860
  • [44] Discriminative Approach to Build Hybrid Vocabulary for Conversational Telephone Speech Recognition of Agglutinative Languages
    Li, Xin
    Pan, Jielin
    Zhao, Qingwei
    Yan, Yonghong
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (11): : 2478 - 2482
  • [45] Recognition of Interest in Human Conversational Speech
    Schuller, Bjoern
    Koehler, Niels
    Mueller, Ronald
    Rigoll, Gerhard
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 793 - 796
  • [46] On the limit of English conversational speech recognition
    Tuske, Zoltan
    Saon, George
    Kingsbury, Brian
    INTERSPEECH 2021, 2021, : 2062 - 2066
  • [47] AUTOMATIC SPEECH RECOGNITION
    IVALL, T
    ELECTRONICS & WIRELESS WORLD, 1984, 90 (1581): : 73 - 76
  • [48] Automatic speech recognition
    O'Shaughnessy, Douglas
    2015 CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication Technologies (CHILECON), 2015, : 417 - 424
  • [49] NAMED ENTITY RECOGNITION FROM CONVERSATIONAL TELEPHONE SPEECH LEVERAGING WORD CONFUSION NETWORKS FOR TRAINING AND RECOGNITION
    Kurata, Gakuto
    Itoh, Nobuyasu
    Nishimura, Masafumi
    Sethy, Abhinav
    Ramabhadran, Bhuvana
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5572 - 5575
  • [50] AUTOMATIC SPEECH RECOGNITION
    RAO, PVS
    PALIWAL, KK
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 1986, 9 : 85 - 120