Wavoice: An mmWave-Assisted Noise-Resistant Speech Recognition System

被引:2
|
作者
Liu, Tiantian [1 ]
Wang, Chao [1 ]
Li, Zhengxiong [2 ]
Huang, Ming-Chun [3 ]
Xu, Wenyao [4 ]
Lin, Feng [1 ]
机构
[1] Zhejiang Univ, Sch Cyber Sci & Technol, ZJU Hangzhou Global Sci & Technol Innovat Ctr, Hangzhou 310027, Peoples R China
[2] Univ Colorado Denver, Dept Comp Sci & Engn, Denver, CO USA
[3] Duke Kunshan Univ, Dept Data & Computat Sci, Suzhou 215316, Jiangsu, Peoples R China
[4] SUNY Buffalo, Dept Comp Sci & Engn, Buffalo, NY 14261 USA
关键词
Multi-modal systems; mmWave sensing; speech recognition; biometrics; ENHANCEMENT;
D O I
10.1145/3597457
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As automatic speech recognition evolves, deployment of the voice user interface (VUI) has boomingly expanded. Especially since the COVID-19 pandemic, the VUI has gained more attention in online communication owing to its non-contact property. However, the VUI struggles to be applied in public scenes due to the degradation of received audio signals caused by various ambient noises. In this article, we propose Wavoice, the first noise-resistant multi-modal speech recognition system that fuses two distinct voices sensing modalities (i.e., millimeter-wave signals and audio signals from a microphone) together. One key contribution is to model the inherent correlation between millimeter-wave and audio signals. Based on it, Wavoice facilitates the real-time noise-resistant voice activity detection and user targeting from multiple speakers. Additionally, we elaborate on two novel modules for multi-modal fusion embedded into the neural network, leading to accurate speech recognition. Extensive experiments prove the effectiveness of Wavoice under adverse conditions-that is, the character recognition error rate below 1% in a range of 7 m. In terms of robustness and accuracy, Wavoice considerably outperforms existing audio-only speech recognition methods with lower character error and word error rates.
引用
收藏
页数:29
相关论文
共 50 条
  • [41] A Modified Model Compensation Approach for Noise Robust Speech Recognition in Radiology Information System
    Li Wei
    Wu Fei-ran
    Ye Zhiqian
    2012 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING (ICAISC 2012), 2012, 12 : 425 - 430
  • [42] A Computer-Assisted Interpreting System for Multilingual Conferences Based on Automatic Speech Recognition
    Liu, Jichao
    Liu, Chengpan
    Shan, Buzheng
    Ganiyusufoglu, Omer S.
    IEEE ACCESS, 2024, 12 : 67498 - 67511
  • [43] Binaural speech enhancement system combining dereverberation and spatial masking-based noise removal for robust speech recognition
    Tien Dung Tran
    Dang Khoa Nguyen
    Quoc Cuong Nguyen
    Huu Binh Nguyen
    2012 FOURTH INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS (ICCE), 2012, : 345 - 350
  • [44] Noise-Robust Multimodal Audio-Visual Speech Recognition System for Speech-Based Interaction Applications
    Jeon, Sanghun
    Kim, Mun Sang
    SENSORS, 2022, 22 (20)
  • [45] An Efficient Noise-Robust Automatic Speech Recognition System using Artificial Neural Networks
    Gupta, Santosh
    Bhurchandi, Kishor M.
    Keskar, Avinash G.
    2016 INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), VOL. 1, 2016, : 1873 - 1877
  • [46] Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling
    Thimmaraja Yadava, G.
    Jayanna, H. S.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (01) : 149 - 167
  • [47] A Noise-Robust Speech Recognition System Based on ZCPA Features and Support Vector Machine
    Bai, Jing
    Zhang, Xueying
    2009 ISECS INTERNATIONAL COLLOQUIUM ON COMPUTING, COMMUNICATION, CONTROL, AND MANAGEMENT, VOL II, 2009, : 42 - 45
  • [48] Development of noise robust real time automatic speech recognition system for Kannada language/dialects
    Yadava, G. Thimmaraja
    Nagaraja, B. G.
    Jayanna, H. S.
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 135
  • [49] Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling
    G. Thimmaraja Yadava
    H. S. Jayanna
    International Journal of Speech Technology, 2020, 23 : 149 - 167
  • [50] To Start Voting, Say Vote: Establishing a Threshold for Ambient Noise for a Speech Recognition Voting System
    Jackson, France
    Solomon, Amber
    McMullen, Kyla
    Gilbert, Juan E.
    6TH INTERNATIONAL CONFERENCE ON APPLIED HUMAN FACTORS AND ERGONOMICS (AHFE 2015) AND THE AFFILIATED CONFERENCES, AHFE 2015, 2015, 3 : 5512 - 5518