An Information Theoretic Combination of MFCC and TDOA Features for Speaker Diarization

被引:20
|
作者
Vijayasenan, Deepu [1 ]
Valente, Fabio [1 ]
Bourlard, Herve [1 ]
机构
[1] Idiap Res Inst, CH-1920 Martigny, Switzerland
基金
瑞士国家科学基金会;
关键词
Feature combination; information bottleneck; meeting data; speaker diarization;
D O I
10.1109/TASL.2010.2048603
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This correspondence describes a novel system for speaker diarization of meetings recordings based on the combination of acoustic features (MFCC) and time delay of arrivals (TDOAS). The first part of the paper analyzes differences between MFCC and TDOA features which possess completely different statistical properties. When Gaussian mixture models are used, experiments reveal that the diarization system is sensitive to the different recording scenarios (i.e., meeting rooms with varying number of microphones). In the second part, a new multistream diarization system is proposed extending previous work on information theoretic diarization. Both speaker clustering and speaker realignment steps are discussed; in contrary to current systems, the proposed method avoids to perform the feature combination averaging log-likelihood scores. Experiments on meetings data reveal that the proposed approach outperforms the GMM-based system when the recording is done with varying number of microphones.
引用
收藏
页码:431 / 438
页数:8
相关论文
共 50 条
  • [1] Multistream speaker diarization of meetings recordings beyond MFCC and TDOA features
    Vijayasenan, Deepu
    Valente, Fabio
    Bourlard, Herve
    SPEECH COMMUNICATION, 2012, 54 (01) : 55 - 67
  • [2] Automatic weighting for the combination of TDOA and acoustic features in speaker diarization for meetings
    Anguera, Xavier
    Wooters, Chuck
    Pardo, Jose M.
    Hernando, Javier
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 241 - +
  • [3] Integration of TDOA Features in Information Bottleneck Framework for Fast Speaker Diarization
    Vijayasenan, Deepu
    Valente, Fabio
    Bourland, Herve
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 40 - 43
  • [4] An Information Theoretic Approach to Speaker Diarization of Meeting Data
    Vijayasenan, Deepu
    Valente, Fabio
    Bourlard, Herve
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (07): : 1382 - 1393
  • [5] Selection of TDOA Parameters for MDM Speaker Diarization
    Martinez-Gonzalez, Beatriz
    Pardo, Jose M.
    Echeverry-Correa, Julian D.
    Vallejo-Pinto, Jose A.
    Barra-Chicote, Roberto
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2155 - 2158
  • [6] LDA combination of pitch and MFCC features in speaker recognition
    Harrag, A
    Mohamadi, T
    Serignat, JF
    INDICON 2005 Proceedings, 2005, : 237 - 240
  • [7] Statistical Speaker Diarization Using Dependent Combination of Extracted Features
    Almgotir-Kadhimi, Hasan
    Woo, Lok
    Dlay, Satnam
    2015 THIRD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, MODELLING AND SIMULATION (AIMS 2015), 2015, : 291 - 296
  • [8] Speaker identification based on combination of MFCC and UMRT based features
    Antony, Anett
    Gopikakumari, R.
    8TH INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING & COMMUNICATIONS (ICACC-2018), 2018, 143 : 250 - 257
  • [9] Using Information Theoretic Vector Quantization for Inverted MFCC based Speaker Verification
    Memon, Sheeraz
    Lech, Margaret
    He, Ling
    2009 2ND INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL AND COMMUNICATION, 2009, : 181 - 185
  • [10] Speaker Diarization with Lexical Information
    Park, Tae Jin
    Han, Kyu J.
    Huang, Jing
    He, Xiaodong
    Zhou, Bowen
    Georgiou, Panayiotis
    Narayanan, Shrikanth
    INTERSPEECH 2019, 2019, : 391 - 395