INCORPORATING WRITTEN DOMAIN NUMERIC GRAMMARS INTO END-TO-END CONTEXTUAL SPEECH RECOGNITION SYSTEMS FOR IMPROVED RECOGNITION OF NUMERIC SEQUENCES

被引：0

作者：

Haynor, Ben ^{[1
]}

Aleksic, Petar S. ^{[1
]}

机构：

[1] Google LLC, Mountain View, CA 94043 USA

来源：

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年

关键词：

Speech recognition; RNN-T; end-to-end; contextual ASR; FSTs;

D O I：

10.1109/icassp40776.2020.9054259

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Accurate recognition of numeric sequences is crucial for many contextual speech recognition applications. For example, a user might create a calendar event and be prompted by a virtual assistant for the time, date, and duration of the event. We propose a modular and scalable solution for improved recognition of numeric sequences. We use finite state transducers built from written domain numeric grammars to increase the likelihood of hypotheses containing matching numeric entities during beam search in an end-to-end speech recognition system. Using our technique results in relative reduction in word error rate of up to 59% on a variety of numeric sequence recognition tasks (times, percentages, digit sequences,...).

引用

页码：7809 / 7813

页数：5

共 50 条

[1] Improving Performance of End-to-End ASR on Numeric Sequences
Peyser, Cal
Zhang, Hao
Sainath, Tara N.
Wu, Zelin
INTERSPEECH 2019, 2019, : 2185 - 2189
[2] Improved training for online end-to-end speech recognition systems
Kim, Suyoun
Seltzer, Michael L.
Li, Jinyu
Zhao, Rui
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2913 - 2917
[3] DEEP CONTEXT: END-TO-END CONTEXTUAL SPEECH RECOGNITION
Pundak, Golan
Sainath, Tara N.
Prabhavalkar, Rohit
Kannan, Anjuli
Zhao, Ding
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 418 - 425
[4] Towards Contextual Spelling Correction for Customization of End-to-End Speech Recognition Systems
Wang, Xiaoqiang
Liu, Yanqing
Li, Jinyu
Miljanic, Veljko
Zhao, Sheng
Khalil, Hosam
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 3089 - 3097
[5] Incorporating End-to-End Speech Recognition Models for Sentiment Analysis
Lakomkin, Egor
Zamani, Mohammad Ali
Webers, Cornelius
Magg, Sven
Wermter, Stefan
2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 7976 - 7982
[6] IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION
Takahashi, Naoya
Singh, Mayank Kumar
Basak, Sakya
Sudarsanam, Parthasaarathy
Ganapathy, Sriram
Mitsufuji, Yuki
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 41 - 45
[7] Contextual Speech Recognition in End-to-End Neural Network Systems using Beam Search
Williams, Ian
Kannan, Anjuli
Aleksci, Petar
Rybach, David
Sainath, Tara N.
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2227 - 2231
[8] Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network
Huang, Kaixun
Zhang, Ao
Yang, Zhanheng
Guo, Pengcheng
Mu, Bingshen
Xu, Tianyi
Xie, Lei
INTERSPEECH 2023, 2023, : 4933 - 4937
[9] PERSONALIZATION STRATEGIES FOR END-TO-END SPEECH RECOGNITION SYSTEMS
Gourav, Aditya
Liu, Linda
Gandhe, Ankur
Gu, Yile
Lan, Guitang
Huang, Xiangyang
Kalmane, Shashank
Tiwari, Gautam
Filimonov, Denis
Rastrow, Ariya
Stolcke, Andreas
Bulyko, Ivan
Alexa, Amazon
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7348 - 7352
[10] Improved training of end-to-end attention models for speech recognition
Zeyer, Albert
Irie, Kazuki
Schlueter, Ralf
Ney, Hermann
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 7 - 11

← 1 2 3 4 5 →