Wavoice: An mmWave-Assisted Noise-Resistant Speech Recognition System

被引:2
|
作者
Liu, Tiantian [1 ]
Wang, Chao [1 ]
Li, Zhengxiong [2 ]
Huang, Ming-Chun [3 ]
Xu, Wenyao [4 ]
Lin, Feng [1 ]
机构
[1] Zhejiang Univ, Sch Cyber Sci & Technol, ZJU Hangzhou Global Sci & Technol Innovat Ctr, Hangzhou 310027, Peoples R China
[2] Univ Colorado Denver, Dept Comp Sci & Engn, Denver, CO USA
[3] Duke Kunshan Univ, Dept Data & Computat Sci, Suzhou 215316, Jiangsu, Peoples R China
[4] SUNY Buffalo, Dept Comp Sci & Engn, Buffalo, NY 14261 USA
关键词
Multi-modal systems; mmWave sensing; speech recognition; biometrics; ENHANCEMENT;
D O I
10.1145/3597457
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As automatic speech recognition evolves, deployment of the voice user interface (VUI) has boomingly expanded. Especially since the COVID-19 pandemic, the VUI has gained more attention in online communication owing to its non-contact property. However, the VUI struggles to be applied in public scenes due to the degradation of received audio signals caused by various ambient noises. In this article, we propose Wavoice, the first noise-resistant multi-modal speech recognition system that fuses two distinct voices sensing modalities (i.e., millimeter-wave signals and audio signals from a microphone) together. One key contribution is to model the inherent correlation between millimeter-wave and audio signals. Based on it, Wavoice facilitates the real-time noise-resistant voice activity detection and user targeting from multiple speakers. Additionally, we elaborate on two novel modules for multi-modal fusion embedded into the neural network, leading to accurate speech recognition. Extensive experiments prove the effectiveness of Wavoice under adverse conditions-that is, the character recognition error rate below 1% in a range of 7 m. In terms of robustness and accuracy, Wavoice considerably outperforms existing audio-only speech recognition methods with lower character error and word error rates.
引用
收藏
页数:29
相关论文
共 50 条
  • [21] Noise-resistant system of concealed information transfer on a chaotic delayed feedback oscillator with switchable delay time
    Kul'minskii, D. D.
    Ponomarenko, V. I.
    Karavaev, A. S.
    Prokhorov, M. D.
    TECHNICAL PHYSICS, 2016, 61 (05) : 639 - 647
  • [22] Noise-resistant system of concealed information transfer on a chaotic delayed feedback oscillator with switchable delay time
    D. D. Kul’minskii
    V. I. Ponomarenko
    A. S. Karavaev
    M. D. Prokhorov
    Technical Physics, 2016, 61 : 639 - 647
  • [23] The Use of Reputation as Noise-resistant Selection Bias in a Co-evolutionary Multi-agent System
    Chatzinikolaou, Nikolaos
    Robertson, David
    PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2012, : 983 - 990
  • [24] A speech recognition system based on dynamic characterization of background noise
    Beritelli, Francesco
    Casale, Salvatore
    Russo, Alessandra
    Serrano, Salvatore
    2006 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY, VOLS 1 AND 2, 2006, : 914 - +
  • [25] A Robust Speech Recognition System against the Ego Noise of a Robot
    Ince, Goekhan
    Nakadai, Kazuhiro
    Rodemann, Tobias
    Tsujino, Hiroshi
    Imura, Jun-ichi
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2070 - +
  • [26] Noise and speaker robustness in a Persian continuous speech recognition system
    Veisi, Hadi
    Sameti, Hossein
    2007 9TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1-3, 2007, : 73 - 76
  • [27] Noise-Robust Algorithm of Speech Features Extraction for Automatic Speech Recognition System
    Yakhnev, A. N.
    Pisarev, A. S.
    PROCEEDINGS OF THE XIX IEEE INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND MEASUREMENTS (SCM 2016), 2016, : 206 - 208
  • [28] Photonics-based 3D radar imaging with CNN-assisted fast and noise-resistant image construction
    Sun, Guanqun
    Zhang, Fangzheng
    Gao, Bindong
    Zhou, Yuewen
    Xiang, Yu
    Pan, Shilong
    OPTICS EXPRESS, 2021, 29 (13) : 19352 - 19361
  • [29] Algorithmic design of a noise-resistant and efficient closed-loop deep brain stimulation system: A computational approach
    Karamintziou, Sofia D.
    Custodio, Ana Luisa
    Piallat, Brigitte
    Polosan, Mircea
    Chabardes, Stephan
    Stathis, Pantelis G.
    Tagaris, George A.
    Sakas, Damianos E.
    Polychronaki, Georgia E.
    Tsirogiannis, George L.
    David, Olivier
    Nikita, Konstantina S.
    PLOS ONE, 2017, 12 (02):
  • [30] PARAMETER SETTING OF NOISE REDUCTION FILTER USING SPEECH RECOGNITION SYSTEM
    Abe, Tomomi
    Matsumoto, Mitsuharu
    Hashimoto, Shuji
    ICFC 2010/ ICNC 2010: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FUZZY COMPUTATION AND INTERNATIONAL CONFERENCE ON NEURAL COMPUTATION, 2010, : 387 - 391