MULTIMODAL SPEAKER DIARIZATION OF REAL-WORLD MEETINGS USING D-VECTORS WITH SPATIAL FEATURES

被引:0
|
作者
Kang, Wonjune [1 ]
Roy, Brandon C. [2 ,3 ]
Chow, Wesley [2 ,3 ]
机构
[1] MIT, Cambridge, MA 02139 USA
[2] MIT, Media Lab, Cambridge, MA 02139 USA
[3] Cortico, Boston, MA USA
关键词
Speaker diarization; d-vector; beamforming; sound source localization; spectral clustering;
D O I
10.1109/icassp40776.2020.9053122
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural network based audio embeddings (d-vectors) have demonstrated superior performance in audio-only speaker diarization compared to traditional acoustic features such as mel-frequency cepstral coefficients (MFCCs) and i-vectors. However, there has been little work on multimodal diarization systems that combine d-vectors with additional sources of information. In this paper, we present a novel approach to multimodal speaker diarization that combines d-vectors with spatial information derived from performing beamforming given a multi-channel microphone array. Our system performs spectral clustering on a combination of speaker embeddings and spatial features that are computed using the Steered-Response Power Phase Transform (SRP-PHAT) algorithm. We evaluate our system on the AMI Meeting Corpus and an internal dataset of real-world conversations. By using both acoustic and spatial features for diarization, we achieve significant improvements over a d-vector only baseline and show potential to achieve comparable results with other state-of-the-art multimodal diarization systems.
引用
收藏
页码:6509 / 6513
页数:5
相关论文
共 50 条
  • [31] Beyond direct methods: Establishing vitamin d reference intervals using real-world data
    Fabregat Bolufer, A. B.
    Escola Rodriguez, A.
    Verdu Bautista, G.
    Bedini Chesa, J. L.
    Filella Pla, X.
    CLINICA CHIMICA ACTA, 2024, 558
  • [32] Real-world malicious event recognition in CCTV recording using Quasi-3D network
    Jan, Atif
    Khan, Gul Muhammad
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2022, 14 (8) : 10457 - 10472
  • [33] Using Facial Symmetry to Handle Pose Variations in Real-World 3D Face Recognition
    Passalis, Georgios
    Perakis, Panagiotis
    Theoharis, Theoharis
    Kakadiaris, Ioannis A.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (10) : 1938 - 1951
  • [34] Real-world malicious event recognition in CCTV recording using Quasi-3D network
    Atif Jan
    Gul Muhammad Khan
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 : 10457 - 10472
  • [35] Automatic Construction of Real-World Datasets for 3D Object Localization using Two Cameras
    Guerin, Joris
    Gibaru, Olivier
    Nyiri, Eric
    Thiery, Stephane
    Palos, Jorge
    IECON 2018 - 44TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2018, : 3655 - 3658
  • [36] Demonstrating the Real-world Significance of the Mid-Swing to Heel Strike Part of the Gait Cycle Using Spectral Features
    Qureshi, Asma
    Engelhard, Matthew M.
    Brandt-Pearce, Maite
    Goldman, Myla D.
    2017 IEEE 14TH INTERNATIONAL CONFERENCE ON WEARABLE AND IMPLANTABLE BODY SENSOR NETWORKS (BSN), 2017, : 133 - 136
  • [37] Clinical Features and Mutational Landscape of Patients Referred for Suspected Essential Thrombocytosis: A Descriptive Study Using a 'Real-World' Database
    Almanaseer, Ala
    Chin-Yee, Benjamin
    Bhai, Pratibha
    Cheong, Ian
    Lazo-Langner, Alejandro
    Ho, Jenny M.
    Levy, Michael A.
    Stuart, Alan
    Lin, Hanxin
    Chin-Yee, Ian
    Sadikovic, Bekim
    Hsia, Cyrus C.
    BLOOD, 2022, 140 : 7948 - 7949
  • [38] Signals Are All You Need: Detecting and Mitigating Digital and Real-World Adversarial Patches Using Signal-Based Features
    Bunzel, Niklas
    Frick, Raphael Antonius
    Klause, Gerrit
    Schwarte, Aino
    Honermann, Jonas
    PROCEEDINGS OF THE 2ND ACM WORKSHOP ON SECURE AND TRUSTWORTHY DEEP LEARNING SYSTEMS, SECTL 2024, 2024, : 24 - 34
  • [39] Efficient link prediction model for real-world complex networks using matrix-forest metric with local similarity features
    Gul, Haji
    Al-Obeidat, Feras
    Amin, Adnan
    Tahir, Muhammad
    Huang, Kaizhu
    JOURNAL OF COMPLEX NETWORKS, 2022, 10 (05)
  • [40] A Deep Learning Network Planner: Propagation Modeling Using Real-World Measurements and a 3D City Model
    Eller, Lukas
    Svoboda, Philipp
    Rupp, Markus
    IEEE ACCESS, 2022, 10 : 122182 - 122196