INCORPORATING WRITTEN DOMAIN NUMERIC GRAMMARS INTO END-TO-END CONTEXTUAL SPEECH RECOGNITION SYSTEMS FOR IMPROVED RECOGNITION OF NUMERIC SEQUENCES

被引:0
|
作者
Haynor, Ben [1 ]
Aleksic, Petar S. [1 ]
机构
[1] Google LLC, Mountain View, CA 94043 USA
关键词
Speech recognition; RNN-T; end-to-end; contextual ASR; FSTs;
D O I
10.1109/icassp40776.2020.9054259
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Accurate recognition of numeric sequences is crucial for many contextual speech recognition applications. For example, a user might create a calendar event and be prompted by a virtual assistant for the time, date, and duration of the event. We propose a modular and scalable solution for improved recognition of numeric sequences. We use finite state transducers built from written domain numeric grammars to increase the likelihood of hypotheses containing matching numeric entities during beam search in an end-to-end speech recognition system. Using our technique results in relative reduction in word error rate of up to 59% on a variety of numeric sequence recognition tasks (times, percentages, digit sequences,...).
引用
收藏
页码:7809 / 7813
页数:5
相关论文
共 50 条
  • [1] Improving Performance of End-to-End ASR on Numeric Sequences
    Peyser, Cal
    Zhang, Hao
    Sainath, Tara N.
    Wu, Zelin
    INTERSPEECH 2019, 2019, : 2185 - 2189
  • [2] Improved training for online end-to-end speech recognition systems
    Kim, Suyoun
    Seltzer, Michael L.
    Li, Jinyu
    Zhao, Rui
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2913 - 2917
  • [3] DEEP CONTEXT: END-TO-END CONTEXTUAL SPEECH RECOGNITION
    Pundak, Golan
    Sainath, Tara N.
    Prabhavalkar, Rohit
    Kannan, Anjuli
    Zhao, Ding
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 418 - 425
  • [4] Towards Contextual Spelling Correction for Customization of End-to-End Speech Recognition Systems
    Wang, Xiaoqiang
    Liu, Yanqing
    Li, Jinyu
    Miljanic, Veljko
    Zhao, Sheng
    Khalil, Hosam
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 3089 - 3097
  • [5] Incorporating End-to-End Speech Recognition Models for Sentiment Analysis
    Lakomkin, Egor
    Zamani, Mohammad Ali
    Webers, Cornelius
    Magg, Sven
    Wermter, Stefan
    2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 7976 - 7982
  • [6] IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION
    Takahashi, Naoya
    Singh, Mayank Kumar
    Basak, Sakya
    Sudarsanam, Parthasaarathy
    Ganapathy, Sriram
    Mitsufuji, Yuki
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 41 - 45
  • [7] Contextual Speech Recognition in End-to-End Neural Network Systems using Beam Search
    Williams, Ian
    Kannan, Anjuli
    Aleksci, Petar
    Rybach, David
    Sainath, Tara N.
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2227 - 2231
  • [8] Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network
    Huang, Kaixun
    Zhang, Ao
    Yang, Zhanheng
    Guo, Pengcheng
    Mu, Bingshen
    Xu, Tianyi
    Xie, Lei
    INTERSPEECH 2023, 2023, : 4933 - 4937
  • [9] PERSONALIZATION STRATEGIES FOR END-TO-END SPEECH RECOGNITION SYSTEMS
    Gourav, Aditya
    Liu, Linda
    Gandhe, Ankur
    Gu, Yile
    Lan, Guitang
    Huang, Xiangyang
    Kalmane, Shashank
    Tiwari, Gautam
    Filimonov, Denis
    Rastrow, Ariya
    Stolcke, Andreas
    Bulyko, Ivan
    Alexa, Amazon
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7348 - 7352
  • [10] Improved training of end-to-end attention models for speech recognition
    Zeyer, Albert
    Irie, Kazuki
    Schlueter, Ralf
    Ney, Hermann
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 7 - 11