Semantic Enhancement Framework for Robust Speech Recognition

被引：0

作者：

Yang, Baochen ^{[1
,2
]}

Yu, Kai ^{[1
,2
]}

机构：

[1] Shanghai Jiao Tong Univ, AI Inst, Dept Comp Sci & Engn, X LANCE Lab,MoE Key Lab Artificial Intelligence, Shanghai, Peoples R China

[2] State Key Lab Media Convergence Prod Technol & Sy, Shanghai, Peoples R China

来源：

MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2022 | 2023年 / 1765卷

关键词：

Semantic enhancement; Speech recognition;

D O I：

10.1007/978-981-99-2401-1_7

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Auto speech recognition (ASR) has been widely used in dialogue systems of various domains, performing as a crucial part of technology. Since the output of the ASR system will provide input to the subsequent system, the semantic intelligibility problem of the recognition results draws wide attention, yet remains unsolved. We propose a semantic enhancement framework to extract global semantic information from the audio to guide the recognition results. We evaluate our method on the Wall Street Journal (WSJ) dataset. The proposed framework gain relative 5.9% and 9.1% improvement of theWER on dev93 set and eval92 set compared to the baseline model.

引用

页码：81 / 88

页数：8

共 17 条

[1] Bruguier A, 2019, INT CONF ACOUST SPEE, P6171, DOI 10.1109/ICASSP.2019.8682441
[2] CONTEXT-AWARE TRANSFORMER TRANSDUCER FOR SPEECH RECOGNITION
Chang, Feng-Ju
Liu, Jing
Radfar, Martin
Mouchtaris, Athanasios
Omologo, Maurizio
Rastrow, Ariya
Kunzmann, Siegfried
[J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 503 - 510
[3] Alleviating ASR Long-Tailed Problem by Decoupling the Learning of Representation and Classification
Deng, Keqi
Cheng, Gaofeng
Yang, Runyan
Yan, Yonghong
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 340 - 354
[4] DEEP SHALLOW FUSION FOR RNN-T PERSONALIZATION
Duc Le
Keren, Gil
Chan, Julian
Mahadeokar, Jay
Fuegen, Christian
Seltzer, Michael L.
[J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 251 - 257
[5] Grootendorst Maarten, 2021, Zenodo
[6] Gulcehre C, 2015, Arxiv, DOI arXiv:1503.03535
[7] SPELL MY NAME: KEYWORD BOOSTED SPEECH RECOGNITION
Jung, Namkyu
Kim, Geonmin
Chung, Joon Son
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6642 - 6646
[8] Kannan A, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5824, DOI 10.1109/ICASSP.2018.8462682
[9] Liu DR, 2020, Arxiv, DOI arXiv:2005.07394
[10] Michaely AH, 2017, 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), P272, DOI 10.1109/ASRU.2017.8268946

← 1 2 →