A Real-Time Sound Source Localization System for Robotic Vacuum Cleaners With a Microphone Array

被引:0
作者
Kim, Jun Hyung [1 ]
Kim, Taehan [1 ]
Kim, Seokhyun [2 ]
Song, Ju-Man [2 ]
Park, Yongjin [2 ]
Kim, Minook [2 ]
Son, Jungkwan [2 ]
Jeong, Jimann [2 ]
Park, Hyung-Min [1 ]
机构
[1] Sogang Univ, Dept Elect Engn, Seoul 04107, South Korea
[2] LG Elect CTO, Seoul 06772, South Korea
关键词
Speech enhancement; Robots; Real-time systems; Location awareness; Computational modeling; Correlation; Vacuum systems; Sensors; Direction-of-arrival estimation; Vectors; Deep neural networks (DNNs); ego-noise reduction; microphone array; real-time speech enhancement; sound source localization (SSL); TRACKING;
D O I
10.1109/JSEN.2024.3500007
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
With the progress of artificial intelligence (AI) technology, home appliances are becoming more advanced to enhance our quality of life. Many smart devices support speech interfaces, including voice commands and user location tracking. However, robotic vacuum cleaners generate strong ego-noise that distorts microphone signals, making it difficult to estimate the user's location. To solve this problem, we propose a real-time sound source localization (SSL) system for a robotic vacuum cleaner equipped with a microphone array. We design a system that consists of speech enhancement, voice activity detection (VAD), and SSL modules. The speech enhancement module includes TRU-Net-Light, which has lower computation and similar speech enhancement performance to tiny recurrent U-net (TRU-Net). The TRU-Net-Light reduces the number of channels to reduce the model size and applies a frequency-axis multihead self-attention to boost representational capacity. The finite state machine-based VAD is designed to detect voice active periods using the output of a speech enhancement module. Furthermore, we present a mask-weighted difference correlation vector and the singular value decomposition (SVD) with smoother coherence transform (DSVD-SCOT) that achieves robust localization performance in severely noisy environments. In the experimented robotic vacuum cleaner, the localization accuracy of the SSL system was 97.9% and 84.0% for signal-to-noise ratios (SNRs) of -3 and -8 dB, respectively. The proposed system was run in real-time, with a real-time factor (RTF) of 0.378, on a single Kryo 585 Silver core in the RB5 platform. A demo of the proposed system is available at https://youtu.be/3d3Cr-cs9aY.
引用
收藏
页码:1243 / 1252
页数:10
相关论文
共 40 条
[1]   DIFFERENTIABLE TRACKING-BASED TRAINING OF DEEP LEARNING SOUND SOURCE LOCALIZERS [J].
Adavanne, Sharath ;
Politis, Archontis ;
Virtanen, Tuomas .
2021 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2021, :211-215
[2]  
[Anonymous], 1993, P EUR
[3]  
Brandstein MS, 1997, INT CONF ACOUST SPEE, P375, DOI 10.1109/ICASSP.1997.599651
[4]   FullSubNet plus : CHANNEL ATTENTION FULLSUBNET WITH COMPLEX SPECTROGRAMS FOR SPEECH ENHANCEMENT [J].
Chen, Jun ;
Wang, Zilin ;
Tuo, Deyi ;
Wu, Zhiyong ;
Kang, Shiyin ;
Meng, Helen .
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :7857-7861
[5]   REAL-TIME DENOISING AND DEREVERBERATION WTIH TINY RECURRENT U-NET [J].
Choi, Hyeong-Seok ;
Park, Sungjin ;
Lee, Jie Hwan ;
Heo, Hoon ;
Jeon, Dongsuk ;
Lee, Kyogu .
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :5789-5793
[6]   Temporal Convolution for Real-time Keyword Spotting on Mobile Devices [J].
Choi, Seungwoo ;
Seo, Seokjun ;
Shin, Beomjun ;
Byun, Hyeongmin ;
Kersner, Martin ;
Kim, Beomsu ;
Kim, Dongyoung ;
Ha, Sungjoo .
INTERSPEECH 2019, 2019, :3372-3376
[7]   ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification [J].
Desplanques, Brecht ;
Thienpondt, Jenthe ;
Demuynck, Kris .
INTERSPEECH 2020, 2020, :3830-3834
[8]   Robust Sound Source Tracking Using SRP-PHAT and 3D Convolutional Neural Networks [J].
Diaz-Guerra, David ;
Miguel, Antonio ;
Beltran, Jose R. .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 :300-311
[9]  
Grondin F, 2019, IEEE INT C INT ROBOT, P5352, DOI [10.1109/iros40897.2019.8967690, 10.1109/IROS40897.2019.8967690]
[10]  
Grondin F, 2019, INT CONF ACOUST SPEE, P4140, DOI 10.1109/ICASSP.2019.8683253