Speakers counting by proposed nested microphone array in combination with limited space SRP

被引:1
作者
Dehghan Firoozabadi, Ali [1 ]
Irarrazaval, Pablo [2 ]
Adasme, Pablo [3 ]
Zabala-Blanco, David [4 ]
Palacios-Jativa, Pablo [5 ]
Durney, Hugo [1 ]
Sanhueza, Miguel [1 ]
Azurdia-Meza, Cesar [5 ]
机构
[1] Univ Tecnol Metropolitana, Dept Elect, Av Jose Pedro Alessandri 1242, Santiago 7800002, Chile
[2] Pontificia Univ Catolica Chile, Elect Engn Dept, Santiago, Chile
[3] Univ Santiago Chile, Elect Engn Dept, Av Ecuador 3519, Santiago 9170124, Chile
[4] Univ Catolica Maule, Ctr Invest Estudios Avanzados Maule CIEAM, Vicerrectoria Invest & Postgrad, Talca 3466706, Chile
[5] Univ Chile, Dept Elect Engn, Santiago 8370451, Chile
来源
29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021) | 2021年
关键词
Speakers counting; nested microphone array; subband processing; classification; filtering; SIGNALS; NUMBER;
D O I
10.23919/EUSIPCO54536.2021.9616309
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, a novel method is presented for estimating the number of speakers based on the microphone arrays. Firstly, a 3D snowflake nested microphone array (SNMA) is proposed for recording the speech signals. In the following, the steered response power (SRP) algorithm is implemented on subbands in limited spaces conditions for all microphone pairs related to the subarrays. Therefore, a weighted averaging method is implemented on subband limited spaces SRPs (LSRP), and the final energy map is compared with the histogram of the maximums of the SRP function on different subbands for various time frames. The passed candidate points are categorized by unsupervised K-means clustering and the number of speakers is estimated by the silhouette criteria. The accuracy of the proposed method is compared with PENS, i-vector PLDA, and wavelet-GEVD algorithms. The results show the superiority of the proposed method in comparison with other previous research.
引用
收藏
页码:271 / 275
页数:5
相关论文
共 13 条
[1]   IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS [J].
ALLEN, JB ;
BERKLEY, DA .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) :943-950
[2]  
[Anonymous], 1993, TIMIT ACOUSTIC PHONE
[3]  
BARTLETT MS, 1954, J ROY STAT SOC B, V16, P296
[4]  
Firoozabadi A. Dehghan, 2020, SIGNAL IMAGE VIDEO P, V14, P1017
[5]   Robust Binaural Localization of a Target Sound Source by Combining Spectral Source Models and Deep Neural Networks [J].
Ma, Ning ;
Gonzalez, Jose A. ;
Brown, Guy J. .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (11) :2122-2131
[6]  
Omiya K., 2017, P 17 INT S COMM INF, P1
[7]   SILHOUETTES - A GRAPHICAL AID TO THE INTERPRETATION AND VALIDATION OF CLUSTER-ANALYSIS [J].
ROUSSEEUW, PJ .
JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 1987, 20 :53-65
[8]  
Sayoud H., 2010, J. Inf. Hiding Multim. Signal Process., V1, P101
[9]   Determining number of speakers from multispeaker speech signals using excitation source information [J].
Swamy, R. Kumara ;
Murty, K. Sri Rama ;
Yegnanarayana, B. .
IEEE SIGNAL PROCESSING LETTERS, 2007, 14 (07) :481-484
[10]  
Tervo S., 2008, P 11 INT WORKSH AC E