END TO END SPEECH RECOGNITION ERROR PREDICTION WITH SEQUENCE TO SEQUENCE LEARNING

被引：0

作者：

Serai, Prashant ^{[1
]}

Stiff, Adam ^{[1
]}

Fosler-Lussier, Eric ^{[1
]}

机构：

[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

来源：

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年

基金：

美国国家科学基金会;

关键词：

Speech Recognition; Error Prediction; Low Resource; Sequence to Sequence Neural Networks; Simulated ASR Errors;

D O I：

10.1109/icassp40776.2020.9054398

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Simulating the errors made by a speech recognizer on plain text has proven useful to help train downstream NLP tasks to be robust to real ASR errors at test time. Prior work in this domain has focused on modeling confusions at the phonetic level, and using a lexicon to convert from words to phones and back, usually accompanied by an FST Language model. We present a novel end to end model to simulate ASR errors. Our approach trains a convolutional sequence to sequence model to take as direct input a word sequence and predict a word sequence as an output. The end to end modeling improves prior published results for recall of recognition errors made by a Switchboard ASR system on unseen Fisher data; we also demonstrate cross-domain robustness by predicting errors made by an unrelated cloud-based ASR system on a Virtual Patient task.

引用

页码：6339 / 6343

页数：5

共 50 条

[41] Trainable Dynamic Subsampling for End-to-End Speech Recognition
Zhang, Shucong
Loweimi, Erfan
Xu, Yumo
Bell, Peter
Renals, Steve
INTERSPEECH 2019, 2019, : 1413 - 1417
[42] MULTILINGUAL SPEECH RECOGNITION WITH A SINGLE END-TO-END MODEL
Toshniwal, Shubham
Sainath, Tara N.
Weiss, Ron J.
Li, Bo
Moreno, Pedro
Weinstein, Eugene
Rao, Kanishka
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4904 - 4908
[43] NIESR: Nuisance Invariant End-to-end Speech Recognition
Hsu, I-Hung
Jaiswal, Ayush
Natarajan, Premkumar
INTERSPEECH 2019, 2019, : 456 - 460
[44] COMBINING END-TO-END AND ADVERSARIAL TRAINING FOR LOW-RESOURCE SPEECH RECOGNITION
Drexler, Jennifer
Glass, James
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 361 - 368
[45] Curriculum Learning based approaches for robust end-to-end far-field speech recognition
Ranjan, Shivesh
Hansen, John H. L.
SPEECH COMMUNICATION, 2021, 132 : 123 - 131
[46] A SPELLING CORRECTION MODEL FOR END-TO-END SPEECH RECOGNITION
Guo, Jinxi
Sainath, Tara N.
Weiss, Ron J.
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5651 - 5655
[47] Domain Expansion for End-to-End Speech Recognition: Applications for Accent/Dialect Speech
Ghorbani, Shahram
Hansen, John H. L.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 762 - 774
[48] CORRECTION OF AUTOMATIC SPEECH RECOGNITION WITH TRANSFORMER SEQUENCE-TO-SEQUENCE MODEL
Hrinchuk, Oleksii
Popova, Mariya
Ginsburg, Boris
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7074 - 7078
[49] Improved training of end-to-end attention models for speech recognition
Zeyer, Albert
Irie, Kazuki
Schlueter, Ralf
Ney, Hermann
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 7 - 11
[50] CYCLE-CONSISTENCY TRAINING FOR END-TO-END SPEECH RECOGNITION
Hori, Takaaki
Astudillo, Ramon
Hayashi, Tomoki
Zhang, Yu
Watanabe, Shinji
Le Roux, Jonathan
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6271 - 6275

← 1 2 3 4 5 →