MULTIMODAL SPEAKER DIARIZATION OF REAL-WORLD MEETINGS USING D-VECTORS WITH SPATIAL FEATURES

被引:0
|
作者
Kang, Wonjune [1 ]
Roy, Brandon C. [2 ,3 ]
Chow, Wesley [2 ,3 ]
机构
[1] MIT, Cambridge, MA 02139 USA
[2] MIT, Media Lab, Cambridge, MA 02139 USA
[3] Cortico, Boston, MA USA
关键词
Speaker diarization; d-vector; beamforming; sound source localization; spectral clustering;
D O I
10.1109/icassp40776.2020.9053122
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural network based audio embeddings (d-vectors) have demonstrated superior performance in audio-only speaker diarization compared to traditional acoustic features such as mel-frequency cepstral coefficients (MFCCs) and i-vectors. However, there has been little work on multimodal diarization systems that combine d-vectors with additional sources of information. In this paper, we present a novel approach to multimodal speaker diarization that combines d-vectors with spatial information derived from performing beamforming given a multi-channel microphone array. Our system performs spectral clustering on a combination of speaker embeddings and spatial features that are computed using the Steered-Response Power Phase Transform (SRP-PHAT) algorithm. We evaluate our system on the AMI Meeting Corpus and an internal dataset of real-world conversations. By using both acoustic and spatial features for diarization, we achieve significant improvements over a d-vector only baseline and show potential to achieve comparable results with other state-of-the-art multimodal diarization systems.
引用
收藏
页码:6509 / 6513
页数:5
相关论文
共 50 条
  • [41] Evaluation of handling degrees of missingness in the features of machine learning algorithms to predict overall survival using real-world lung cancer data
    Le, Hoa
    Qu, Pingping
    Xiong, Yan
    Tanaka, Yoko
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2023, 32 : 423 - 423
  • [42] Real-World Anomaly Detection in Video Using Spatio-Temporal Features Analysis for Weakly Labelled Data with Auto Label Generation
    Nayak, Rikin J.
    Chaudhari, Jitendra P.
    INTERNATIONAL JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING SYSTEMS, 2023, 14 (05) : 565 - 573
  • [43] Relationships Among Heart Rate, β-Blocker Dosage, and Prognosis in Patients With Coronary Artery Disease in a Real-World Database Using a Multimodal Data Acquisition System
    Oba, Yusuke
    Kabutoya, Tomoyuki
    Kohro, Takahide
    Imai, Yasushi
    Kario, Kazuomi
    Sato, Hisahiko
    Nochioka, Kotaro
    Nakayama, Masaharu
    Fujita, Hideo
    Mizuno, Yoshiko
    Kiyosue, Arihiro
    Iwai, Takamasa
    Miyamoto, Yoshihiro
    Nakano, Yasuhiro
    Nakamura, Taishi
    Tsujita, Kenichi
    Matoba, Tetsuya
    Nagai, Ryozo
    CIRCULATION JOURNAL, 2023, 87 (02) : 336 - +
  • [44] Determinants of cancer incidence and mortality among people with vitamin D deficiency: an epidemiology study using a real-world population database
    Lai, Yi-Chen
    Chen, Yu-Han
    Liang, Fu-Wen
    Wu, Yu-Cih
    Wang, Jhi-Joung
    Lim, Sher-Wei
    Ho, Chung-Han
    FRONTIERS IN NUTRITION, 2023, 10
  • [45] 3D reconstruction of unlimited-size real-world porous media by combining a BicycleGAN-based multimodal dictionary and super-dimension reconstruction
    Li, Yang
    Han, Guanghui
    Jian, Pengpeng
    GEOENERGY SCIENCE AND ENGINEERING, 2023, 228
  • [46] The impact of blood pressure guidelines on the cardiovascular event risk in Japanese patients with coronary artery disease in a real-world database using a multimodal data acquisition system
    Oba, Y.
    Kabutoya, T.
    Kohro, T.
    Imai, Y.
    Kario, K.
    Ishii, M.
    Tsujita, K.
    Fujita, H.
    Matoba, T.
    Kiyosue, A.
    Mizuno, M.
    Nakayama, M.
    Miyamoto, Y.
    Sato, H.
    Nagai, R.
    EUROPEAN HEART JOURNAL, 2024, 45
  • [47] Deep Transfer Learning Using Real-World Image Features for Medical Image Classification, with a Case Study on Pneumonia X-ray Images
    Gu, Chanhoe
    Lee, Minhyeok
    BIOENGINEERING-BASEL, 2024, 11 (04):
  • [48] FIRST REAL-WORLD STUDY ASSESSING HEALTH UTILITY VALUES FOR CHRONIC SPONTANEOUS/IDIOPATHIC URTICARIA USING THE EQ-5D
    McBride, D.
    Chambenoit, O.
    Chiva-Razavi, S.
    Lynde, C.
    Sussman, G.
    Chapman-Rothe, N.
    Weller, K.
    Maurer, M.
    Koenders, J.
    Knulst, A. C.
    Elberink, J. N.
    Halliday, A.
    Alexopoulos, S. T.
    Nakonechna, A.
    Abouzakouk, M.
    Sweeney, C.
    Radder, C.
    Wolin, D.
    Hollis, K.
    Tian, H.
    Balp, M.
    Grattan, C.
    VALUE IN HEALTH, 2015, 18 (07) : A425 - A425
  • [49] Effect of Using the mySugr App on Glycemic Control in T1D patients-A Real-World Analysis from Mexico
    Vulcano, Cristina
    Mikulski, Heather
    Mitter, Michael
    Ruch, Bernhard
    DIABETES, 2023, 72