A LEARNING-BASED APPROACH TO DIRECTION OF ARRIVAL ESTIMATION IN NOISY AND REVERBERANT ENVIRONMENTS

被引:0
作者
Xiao, Xiong [1 ]
Zhao, Shengkui [2 ]
Zhong, Xionghu [3 ]
Jones, Douglas L. [2 ]
Chng, Eng Siong [3 ]
Li, Haizhou [3 ,4 ]
机构
[1] Nanyang Technol Univ, Temasek Lab, Singapore, Singapore
[2] Adv Digital Sci Ctr, Singapore, Singapore
[3] Nanyang Technol Univ, Sch Comp Engn, Singapore, Singapore
[4] Inst Infocomm Res, Dept Human Language Technol, Singapore, Singapore
来源
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) | 2015年
关键词
microphone arrays; direction of arrival; least squares; machine learning; neural networks; HISTOGRAM EQUALIZATION; LOCALIZATION; ADAPTATION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a learning-based approach to the task of direction of arrival estimation (DOA) from microphone array input. Traditional signal processing methods such as the classic least square (LS) method rely on strong assumptions on signal models and accurate estimations of time delay of arrival (TDOA). They only work well in relatively clean conditions, but suffer from noise and reverberation distortions. In this paper, we propose a learning-based approach that can learn from a large amount of simulated noisy and reverberant microphone array inputs for robust DOA estimation. Specifically, we extract features from the generalised cross correlation (GCC) vectors and use a multilayer perceptron neural network to learn the nonlinear mapping from such features to the DOA. One advantage of the learning based method is that as more and more training data becomes available, the DOA estimation will become more and more accurate. Experimental results on simulated data show that the proposed learning based method produces much better results than the state-of-the-art LS method. The testing results on real data recorded in meeting rooms show improved root-mean-square error (RMSE) compared to the LS method.
引用
收藏
页码:2814 / 2818
页数:5
相关论文
共 27 条
[1]   IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS [J].
ALLEN, JB ;
BERKLEY, DA .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) :943-950
[2]  
[Anonymous], 2009, Distant speech recognition
[3]  
[Anonymous], IEEE T SPEECH AUDIO
[4]  
[Anonymous], IEEE T SPEECH AUDIO
[5]  
[Anonymous], 2009, NEURAL NETWORKS LEAR
[6]   Adaptive eigenvalue decomposition algorithm for passive acoustic source localization [J].
Benesty, J .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2000, 107 (01) :384-391
[7]   Histogram equalization of speech representation for robust speech recognition [J].
de la Torre, A ;
Peinado, AM ;
Segura, JC ;
Pérez-Córdoba, JL ;
Benítez, MC ;
Rubio, AJ .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (03) :355-366
[8]  
Dibiase J. H., 2001, THESIS
[9]   Sound localization in reverberant environment based on the model of the precedence effect [J].
Huang, J ;
Ohnishi, N ;
Sugie, N .
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 1997, 46 (04) :842-846
[10]   Contrast enhancement using brightness preserving bi-histogram equalization [J].
Kim, YT .
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 1997, 43 (01) :1-8