Multi-Speaker Direction of Arrival Estimation using SRP-PHAT Algorithm with a Weighted Histogram

被引:0
|
作者
Hadad, Elior [1 ]
Gannot, Sharon [1 ]
机构
[1] Bar Ilan Univ, Fac Engn, IL-5290002 Ramat Gan, Israel
关键词
LOCALIZATION;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
A direction of arrival (BOA) estimator for concurrent speakers in a reverberant environment is presented. The DOA estimation task is formulated in the short-time Fourier transform (STFT) in two stages. In the first stage, a single narrow-band BOA per time-frequency (T-F) is selected, since the speech sources are assumed to exhibit disjoint activity in the STFT domain. The narrow-band DOA is obtained as the maximum of the narrow-band steered response power phase transform (SRP-PHAT) localization spectrum at that T-F bin. In addition, for each narrow-hand DOA, a quality measure is calculated, which provides the confidence in the estimated decision. In the second stage, the wide-band localization spectrum is calculated using a weighted histogram of the narrow-band DOAs with the quality measures as weight. Finally, the wide band DOA estimation is obtained by selecting the peaks in the wide-band localization spectrum. The results of our experimental study demonstrate the benefit of the proposed algorithm as compared to the wide-band SRP-PHAT algorithm in a reverberant environment.
引用
收藏
页数:5
相关论文
共 50 条
  • [11] Accelerating the SRP-PHAT algorithm on multi- and many-core platforms using OpenCL
    Badia, Jose M.
    Belloch, Jose A.
    Cobos, Maximo
    Igual, Francisco D.
    Quintana-Orti, Enrique S.
    JOURNAL OF SUPERCOMPUTING, 2019, 75 (03): : 1284 - 1297
  • [12] Multi-speaker Direction of Arrival Estimation Using Audio and Visual Modalities with Convolutional Neural Network
    Wu, Yulin
    Hu, Ruimin
    Wang, Xiaochen
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 636 - 641
  • [13] Multimodal speaker diarization for meetings using volume-evaluated SRP-PHAT and video analysis
    P. Cabañas-Molero
    M. Lucena
    J. M. Fuertes
    P. Vera-Candeas
    N. Ruiz-Reyes
    Multimedia Tools and Applications, 2018, 77 : 27685 - 27707
  • [14] Joint estimation of pitch and direction of arrival: improving robustness and accuracy for multi-speaker scenarios
    Gerlach, Stephan
    Bitzer, Joerg
    Goetze, Stefan
    Doclo, Simon
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2014,
  • [15] Neural Network Adaptation and Data Augmentation for Multi-Speaker Direction-of-Arrival Estimation
    He, Weipeng
    Motlicek, Petr
    Odobez, Jean-Marc
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1303 - 1317
  • [16] Multimodal speaker diarization for meetings using volume-evaluated SRP-PHAT and video analysis
    Cabanas-Molero, P.
    Lucena, M.
    Fuertes, J. M.
    Vera-Candeas, P.
    Ruiz-Reyes, N.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (20) : 27685 - 27707
  • [17] Joint estimation of pitch and direction of arrival: improving robustness and accuracy for multi-speaker scenarios
    Stephan Gerlach
    Jörg Bitzer
    Stefan Goetze
    Simon Doclo
    EURASIP Journal on Audio, Speech, and Music Processing, 2014 (1)
  • [18] GPU-based approaches for real-time sound source localization using the SRP-PHAT algorithm
    Minotto, Vicente Peruffo
    Jung, Claudio Rosito
    da Silveira, Luiz Gonzaga, Jr.
    Lee, Bowon
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2013, 27 (03): : 291 - 306
  • [19] Acoustic Source Localization Using a Geometrically Sampled Grid SRP-PHAT Algorithm With Max-Pooling Operation
    Salvati, Daniele
    Drioli, Carlo
    Foresti, Gian Luca
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1828 - 1832
  • [20] Multi-speaker DoA Estimation Using Audio and Visual Modality
    Yulin Wu
    Ruimin Hu
    Xiaochen Wang
    Shanfa Ke
    Neural Processing Letters, 2023, 55 : 8887 - 8901