Deep ad-hoc beamforming

被引:16
作者
Zhang, Xiao-Lei [1 ,2 ]
机构
[1] Northwestern Polytech Univ Shenzhen, Inst Res & Dev, Shenzhen, Peoples R China
[2] Northwestern Polytech Univ, Sch Marine Sci & Technol, Ctr Intelligent Acoust & Immers Commun, Shenzhen, Peoples R China
基金
美国国家科学基金会;
关键词
Adaptive beamforming; Ad-hoc microphone array; Channel selection; Deep learning; Distributed microphone array; SPEECH ENHANCEMENT; SENSOR NETWORKS; TIME; SEPARATION; CLASSIFICATION; INFORMATION; FREQUENCY; FRAMEWORK; DECOMPOSITION; FILTERBANK;
D O I
10.1016/j.csl.2021.101201
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Far -field speech processing is an important and challenging problem. In this paper, we propose deep ad-hoc beamforming, a deep-learning-based multichannel speech enhancement framework based on ad-hoc microphone arrays, to address the problem. It contains three novel components. First, it combines ad-hoc microphone arrays with deep-learning-based multichannel speech enhancement, which reduces the probability of the occurrence of far field acoustic environments significantly. Second, it groups the microphones around the speech source to a local microphone array by a supervised channel selection framework based on deep neural networks. Third, it develops a simple time synchronization framework to synchronize the channels that have different time delay. Besides the above novelties and advantages, the proposed model is also trained in single-channel fashion, so that it can easily employ new development of speech processing techniques. Its test stage is also flexible in incorporating any number of microphones without retraining or modifying the framework. We have developed many implementations of the proposed framework and conducted an extensive experiment in scenarios where the locations of the speech sources are far -field, random, and blind to the microphones. Results on speech enhancement tasks show that our method outperforms its counterpart that works with linear microphone arrays by a considerable margin in both diffuse noise reverberant environments and point source noise reverberant environments. We have also tested the framework with different handcrafted features. Results show that although good features lead to high performance, they do not affect the conclusion on the effectiveness of the proposed framework. (c) 2021 Elsevier Ltd. All rights reserved.
引用
收藏
页数:18
相关论文
共 68 条
[1]  
Bai ZX, 2020, INT CONF ACOUST SPEE, P6819, DOI [10.1109/icassp40776.2020.9053674, 10.1109/ICASSP40776.2020.9053674]
[2]   Speaker Verification by Partial AUC Optimization With Mahalanobis Distance Metric Learning [J].
Bai, Zhongxin ;
Zhang, Xiao-Lei ;
Chen, Jingdong .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 :1533-1548
[3]   Cosine metric learning based speaker verification [J].
Bai, Zhongxin ;
Zhang, Xiao-Lei ;
Chen, Jingdong .
SPEECH COMMUNICATION, 2020, 118 :10-20
[4]   COHERENCE AND TIME-DELAY ESTIMATION [J].
CARTER, GC .
PROCEEDINGS OF THE IEEE, 1987, 75 (02) :236-255
[5]   Time delay estimation in room acoustic environments: An overview [J].
Chen, Jingdong ;
Benesty, Jacob ;
Huang, Yiteng Arden .
EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2006, 2006 (1)
[6]   A Feature Study for Classification-Based Speech Separation at Low Signal-to-Noise Ratios [J].
Chen, Jitong ;
Wang, Yuxuan ;
Wang, DeLiang .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) :1993-2002
[7]   Features for Masking-Based Monaural Speech Separation in Reverberant Conditions [J].
Delfarah, Masood ;
Wang, DeLiang .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (05) :1085-1094
[8]  
Ditter D, 2020, INT CONF ACOUST SPEE, P36, DOI [10.1109/ICASSP40776.2020.9053602, 10.1109/icassp40776.2020.9053602]
[9]  
Du J, 2014, INT CONF SIGN PROCES, P473, DOI 10.1109/ICOSP.2014.7015050
[10]   Improved MVDR beamforming using single-channel mask prediction networks [J].
Erdogan, Hakan ;
Hershey, John ;
Watanabe, Shinji ;
Mandel, Michael ;
Le Roux, Jonathan .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :1981-1985