Wavoice: An mmWave-Assisted Noise-Resistant Speech Recognition System

被引：2

作者：

Liu, Tiantian ^{[1
]}

Wang, Chao ^{[1
]}

Li, Zhengxiong ^{[2
]}

Huang, Ming-Chun ^{[3
]}

Xu, Wenyao ^{[4
]}

Lin, Feng ^{[1
]}

机构：

[1] Zhejiang Univ, Sch Cyber Sci & Technol, ZJU Hangzhou Global Sci & Technol Innovat Ctr, Hangzhou 310027, Peoples R China

[2] Univ Colorado Denver, Dept Comp Sci & Engn, Denver, CO USA

[3] Duke Kunshan Univ, Dept Data & Computat Sci, Suzhou 215316, Jiangsu, Peoples R China

[4] SUNY Buffalo, Dept Comp Sci & Engn, Buffalo, NY 14261 USA

来源：

ACM TRANSACTIONS ON SENSOR NETWORKS | 2024年 / 20卷 / 04期

关键词：

Multi-modal systems; mmWave sensing; speech recognition; biometrics; ENHANCEMENT;

D O I：

10.1145/3597457

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

As automatic speech recognition evolves, deployment of the voice user interface (VUI) has boomingly expanded. Especially since the COVID-19 pandemic, the VUI has gained more attention in online communication owing to its non-contact property. However, the VUI struggles to be applied in public scenes due to the degradation of received audio signals caused by various ambient noises. In this article, we propose Wavoice, the first noise-resistant multi-modal speech recognition system that fuses two distinct voices sensing modalities (i.e., millimeter-wave signals and audio signals from a microphone) together. One key contribution is to model the inherent correlation between millimeter-wave and audio signals. Based on it, Wavoice facilitates the real-time noise-resistant voice activity detection and user targeting from multiple speakers. Additionally, we elaborate on two novel modules for multi-modal fusion embedded into the neural network, leading to accurate speech recognition. Extensive experiments prove the effectiveness of Wavoice under adverse conditions-that is, the character recognition error rate below 1% in a range of 7 m. In terms of robustness and accuracy, Wavoice considerably outperforms existing audio-only speech recognition methods with lower character error and word error rates.

引用

页数：29

共 50 条

[31] A noise-resistant signal-code generation algorithm based on the nonlinear system with cross feedback for secure telecommunications systems
Dubrovsky, V. V.
Popova, M. S.
2018 SYSTEMS OF SIGNAL SYNCHRONIZATION, GENERATING AND PROCESSING IN TELECOMMUNICATIONS (SYNCHROINFO), 2018,
[32] Noisy Speech Training in MFCC-based Speech Recognition with Noise Suppression Toward Robot Assisted Autism therapy
Attawibulkul, Sujirat
Kaewkamnerdpong, Boonserm
Miyanaga, Yoshikazu
2017 10TH BIOMEDICAL ENGINEERING INTERNATIONAL CONFERENCE (BMEICON), 2017,
[33] Adversarial Example Devastation and Detection on Speech Recognition System by Adding Random Noise
Dong, Mingyu
Yan, Diqun
Gong, Yongkang
JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2023, 71 (1-2): : 34 - 44
[34] Noise Robust Speech Recognition System using Mel Cepstral and Genetic Algorithm
Mamta, Garg
Shatru, Arora Ajat
Savita, Gupta
2016 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, AND OPTIMIZATION TECHNIQUES (ICEEOT), 2016, : 3151 - 3155
[35] A Noise-Robust Speech Recognition System Based on Wavelet Neural Network
Wang, Yiping
Zhao, Zhefeng
ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, PT III, 2011, 7004 : 392 - 397
[36] Speech recognition in noise for cochlear implantees with a two-microphone monaural adaptive noise reduction system
Wouters, J
Vanden Berghe, J
EAR AND HEARING, 2001, 22 (05): : 420 - 430
[37] ATR parallel decoding based speech recognition system robust to noise and speaking styles
Matsuda, S
Jitsuhiro, T
Markov, K
Nakamura, S
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (03) : 989 - 997
[38] Research of a Non-Specific Person Noise-Robust Speech Recognition System
Bai, Jing
Zhang, Xueying
2009 5TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-8, 2009, : 2014 - 2017
[39] Noise Robust Tamil Speech Word Recognition System by Means of PAC Features with ANFIS
Rojathai, S.
Venkatesulu, M.
2014 IEEE/ACIS 13TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2014, : 425 - +
[40] Speech recognition system in high noise background based on discriminative learning of environmental features
Lu, Cheng-Guo
Han, Ji-Qing
Wang, Cheng-Fa
Zhang, Lei
Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2003, 35 (02): : 134 - 137

← 1 2 3 4 5 →