Wavoice: An mmWave-Assisted Noise-Resistant Speech Recognition System

被引：2

作者：

Liu, Tiantian ^{[1
]}

Wang, Chao ^{[1
]}

Li, Zhengxiong ^{[2
]}

Huang, Ming-Chun ^{[3
]}

Xu, Wenyao ^{[4
]}

Lin, Feng ^{[1
]}

机构：

[1] Zhejiang Univ, Sch Cyber Sci & Technol, ZJU Hangzhou Global Sci & Technol Innovat Ctr, Hangzhou 310027, Peoples R China

[2] Univ Colorado Denver, Dept Comp Sci & Engn, Denver, CO USA

[3] Duke Kunshan Univ, Dept Data & Computat Sci, Suzhou 215316, Jiangsu, Peoples R China

[4] SUNY Buffalo, Dept Comp Sci & Engn, Buffalo, NY 14261 USA

来源：

ACM TRANSACTIONS ON SENSOR NETWORKS | 2024年 / 20卷 / 04期

关键词：

Multi-modal systems; mmWave sensing; speech recognition; biometrics; ENHANCEMENT;

D O I：

10.1145/3597457

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

As automatic speech recognition evolves, deployment of the voice user interface (VUI) has boomingly expanded. Especially since the COVID-19 pandemic, the VUI has gained more attention in online communication owing to its non-contact property. However, the VUI struggles to be applied in public scenes due to the degradation of received audio signals caused by various ambient noises. In this article, we propose Wavoice, the first noise-resistant multi-modal speech recognition system that fuses two distinct voices sensing modalities (i.e., millimeter-wave signals and audio signals from a microphone) together. One key contribution is to model the inherent correlation between millimeter-wave and audio signals. Based on it, Wavoice facilitates the real-time noise-resistant voice activity detection and user targeting from multiple speakers. Additionally, we elaborate on two novel modules for multi-modal fusion embedded into the neural network, leading to accurate speech recognition. Extensive experiments prove the effectiveness of Wavoice under adverse conditions-that is, the character recognition error rate below 1% in a range of 7 m. In terms of robustness and accuracy, Wavoice considerably outperforms existing audio-only speech recognition methods with lower character error and word error rates.

引用

页数：29

共 50 条

[41] A Modified Model Compensation Approach for Noise Robust Speech Recognition in Radiology Information System
Li Wei
Wu Fei-ran
Ye Zhiqian
2012 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING (ICAISC 2012), 2012, 12 : 425 - 430
[42] A Computer-Assisted Interpreting System for Multilingual Conferences Based on Automatic Speech Recognition
Liu, Jichao
Liu, Chengpan
Shan, Buzheng
Ganiyusufoglu, Omer S.
IEEE ACCESS, 2024, 12 : 67498 - 67511
[43] Binaural speech enhancement system combining dereverberation and spatial masking-based noise removal for robust speech recognition
Tien Dung Tran
Dang Khoa Nguyen
Quoc Cuong Nguyen
Huu Binh Nguyen
2012 FOURTH INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS (ICCE), 2012, : 345 - 350
[44] Noise-Robust Multimodal Audio-Visual Speech Recognition System for Speech-Based Interaction Applications
Jeon, Sanghun
Kim, Mun Sang
SENSORS, 2022, 22 (20)
[45] An Efficient Noise-Robust Automatic Speech Recognition System using Artificial Neural Networks
Gupta, Santosh
Bhurchandi, Kishor M.
Keskar, Avinash G.
2016 INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), VOL. 1, 2016, : 1873 - 1877
[46] Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling
Thimmaraja Yadava, G.
Jayanna, H. S.
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (01) : 149 - 167
[47] A Noise-Robust Speech Recognition System Based on ZCPA Features and Support Vector Machine
Bai, Jing
Zhang, Xueying
2009 ISECS INTERNATIONAL COLLOQUIUM ON COMPUTING, COMMUNICATION, CONTROL, AND MANAGEMENT, VOL II, 2009, : 42 - 45
[48] Development of noise robust real time automatic speech recognition system for Kannada language/dialects
Yadava, G. Thimmaraja
Nagaraja, B. G.
Jayanna, H. S.
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 135
[49] Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling
G. Thimmaraja Yadava
H. S. Jayanna
International Journal of Speech Technology, 2020, 23 : 149 - 167
[50] To Start Voting, Say Vote: Establishing a Threshold for Ambient Noise for a Speech Recognition Voting System
Jackson, France
Solomon, Amber
McMullen, Kyla
Gilbert, Juan E.
6TH INTERNATIONAL CONFERENCE ON APPLIED HUMAN FACTORS AND ERGONOMICS (AHFE 2015) AND THE AFFILIATED CONFERENCES, AHFE 2015, 2015, 3 : 5512 - 5518

← 1 2 3 4 5 →