Wavoice: An mmWave-Assisted Noise-Resistant Speech Recognition System

被引:2
|
作者
Liu, Tiantian [1 ]
Wang, Chao [1 ]
Li, Zhengxiong [2 ]
Huang, Ming-Chun [3 ]
Xu, Wenyao [4 ]
Lin, Feng [1 ]
机构
[1] Zhejiang Univ, Sch Cyber Sci & Technol, ZJU Hangzhou Global Sci & Technol Innovat Ctr, Hangzhou 310027, Peoples R China
[2] Univ Colorado Denver, Dept Comp Sci & Engn, Denver, CO USA
[3] Duke Kunshan Univ, Dept Data & Computat Sci, Suzhou 215316, Jiangsu, Peoples R China
[4] SUNY Buffalo, Dept Comp Sci & Engn, Buffalo, NY 14261 USA
关键词
Multi-modal systems; mmWave sensing; speech recognition; biometrics; ENHANCEMENT;
D O I
10.1145/3597457
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As automatic speech recognition evolves, deployment of the voice user interface (VUI) has boomingly expanded. Especially since the COVID-19 pandemic, the VUI has gained more attention in online communication owing to its non-contact property. However, the VUI struggles to be applied in public scenes due to the degradation of received audio signals caused by various ambient noises. In this article, we propose Wavoice, the first noise-resistant multi-modal speech recognition system that fuses two distinct voices sensing modalities (i.e., millimeter-wave signals and audio signals from a microphone) together. One key contribution is to model the inherent correlation between millimeter-wave and audio signals. Based on it, Wavoice facilitates the real-time noise-resistant voice activity detection and user targeting from multiple speakers. Additionally, we elaborate on two novel modules for multi-modal fusion embedded into the neural network, leading to accurate speech recognition. Extensive experiments prove the effectiveness of Wavoice under adverse conditions-that is, the character recognition error rate below 1% in a range of 7 m. In terms of robustness and accuracy, Wavoice considerably outperforms existing audio-only speech recognition methods with lower character error and word error rates.
引用
收藏
页数:29
相关论文
共 50 条
  • [31] A noise-resistant signal-code generation algorithm based on the nonlinear system with cross feedback for secure telecommunications systems
    Dubrovsky, V. V.
    Popova, M. S.
    2018 SYSTEMS OF SIGNAL SYNCHRONIZATION, GENERATING AND PROCESSING IN TELECOMMUNICATIONS (SYNCHROINFO), 2018,
  • [32] Noisy Speech Training in MFCC-based Speech Recognition with Noise Suppression Toward Robot Assisted Autism therapy
    Attawibulkul, Sujirat
    Kaewkamnerdpong, Boonserm
    Miyanaga, Yoshikazu
    2017 10TH BIOMEDICAL ENGINEERING INTERNATIONAL CONFERENCE (BMEICON), 2017,
  • [33] Adversarial Example Devastation and Detection on Speech Recognition System by Adding Random Noise
    Dong, Mingyu
    Yan, Diqun
    Gong, Yongkang
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2023, 71 (1-2): : 34 - 44
  • [34] Noise Robust Speech Recognition System using Mel Cepstral and Genetic Algorithm
    Mamta, Garg
    Shatru, Arora Ajat
    Savita, Gupta
    2016 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, AND OPTIMIZATION TECHNIQUES (ICEEOT), 2016, : 3151 - 3155
  • [35] A Noise-Robust Speech Recognition System Based on Wavelet Neural Network
    Wang, Yiping
    Zhao, Zhefeng
    ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, PT III, 2011, 7004 : 392 - 397
  • [36] Speech recognition in noise for cochlear implantees with a two-microphone monaural adaptive noise reduction system
    Wouters, J
    Vanden Berghe, J
    EAR AND HEARING, 2001, 22 (05): : 420 - 430
  • [37] ATR parallel decoding based speech recognition system robust to noise and speaking styles
    Matsuda, S
    Jitsuhiro, T
    Markov, K
    Nakamura, S
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (03) : 989 - 997
  • [38] Research of a Non-Specific Person Noise-Robust Speech Recognition System
    Bai, Jing
    Zhang, Xueying
    2009 5TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-8, 2009, : 2014 - 2017
  • [39] Noise Robust Tamil Speech Word Recognition System by Means of PAC Features with ANFIS
    Rojathai, S.
    Venkatesulu, M.
    2014 IEEE/ACIS 13TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2014, : 425 - +
  • [40] Speech recognition system in high noise background based on discriminative learning of environmental features
    Lu, Cheng-Guo
    Han, Ji-Qing
    Wang, Cheng-Fa
    Zhang, Lei
    Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2003, 35 (02): : 134 - 137