Deep and CNN fusion method for binaural sound source localisation

被引：14

作者：

Jiang, Shilong ^{[1
]}

Wu, Lulu ^{[2
]}

Yuan, Peipei ^{[2
]}

Sun, Yongheng ^{[2
]}

Liu, Hong ^{[2
]}

机构：

[1] PKU KUST Shenzhen Hong Kong Inst, Shenzhen, Peoples R China

[2] Peking Univ, Shenzhen Grad Sch, Key Lab Machine Percept, Shenzhen, Peoples R China

来源：

JOURNAL OF ENGINEERING-JOE | 2020年 / 2020卷 / 13期

基金：

中国国家自然科学基金;

关键词：

feature extraction; acoustic signal processing; probability; convolutional neural nets; signal classification; correlation methods; CNN; binaural sound source localisation; convolutional neural network; cross-correlation function; binaural signals; deep neural network; azimuth classification task; CCF-ILD features extraction; interaural level differences; maximum posterior probability; MODEL;

D O I：

10.1049/joe.2019.1207

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

In binaural sound source localisation, front-back confusion is often the challenging problem when localising sources in the noisy or reverberant environments. Hence, a novel algorithm fusing deep and convolutional neural network (CNN) is proposed to address this issue. First, joint features, which consist of interaural level differences (ILDs) and cross-correlation function (CCF) within a lag range, are extracted from binaural signals. Second, with the extracted CCF-ILD features, CNN is used for the front-back classification task, while deep neural network is used for azimuth classification task. The front-back features extracted by the CNN can be leveraged as additional information for the sound source localisation task. Also, an angle-loss function is designed to avoid the overfitting problem and to improve the generalisation ability of this method in adverse acoustic conditions. Finally, two branches are concatenated and then followed by an output layer, which generates the posterior probability of azimuth angles, and the azimuth corresponding to the maximum posterior probability is chosen as the direction of sound source. Experimental results demonstrate the effectiveness of the authors' method for front-back decision and azimuth estimation in noisy and reverberant environments.

引用

页码：511 / 516

页数：6

共 21 条

[1]

[Anonymous], 2012, ADADELTA ADAPTIVE LE

[2] A survey on sound source localization in robotics: From binaural to array processing methods [J].

Argentieri, S. ;

Danes, P. ;

Soueres, P. .

COMPUTER SPEECH AND LANGUAGE, 2015, 34 (01) :87-112

[3]

Blauert J, 1997, Spatial Hearing: The Psychophysics of Human Sound Localization

[4] SMOOTHED COHERENCE TRANSFORM [J].

CARTER, GC ;

NUTTALL, AH ;

CABLE, PG .

PROCEEDINGS OF THE IEEE, 1973, 61 (10) :1497-1498

[5]

Garofolo J.S., 1993, NASA STIRECON TECHNI, P93

[6] Model-Based Dereverberation Preserving Binaural Cues [J].

Jeub, Marco ;

Schaefer, Magnus ;

Esch, Thomas ;

Vary, Peter .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07) :1732-1745

[7]

Jeub Marco, 2009, BINAURAL ROOM IMPULS, V1, P550

[8] GENERALIZED CORRELATION METHOD FOR ESTIMATION OF TIME-DELAY [J].

KNAPP, CH ;

CARTER, GC .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1976, 24 (04) :320-327

[9] MODEL FOR INTERAURAL TIME DIFFERENCES IN AZIMUTHAL PLANE [J].

KUHN, GF .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1977, 62 (01) :157-167

[10]

Lyon R. F., 1983, Proceedings of ICASSP 83. IEEE International Conference on Acoustics, Speech and Signal Processing, P1148

← 1 2 3 →