AN EM ALGORITHM FOR JOINT SOURCE SEPARATION AND DIARISATION OF MULTICHANNEL CONVOLUTIVE SPEECH MIXTURES

被引:0
|
作者
Kounades-Bastian, Dionyssos [1 ]
Girin, Laurent [1 ,2 ]
Alameda-Pineda, Xavier [3 ]
Gannot, Sharon [4 ]
Horaud, Radu [1 ]
机构
[1] INRIA Grenoble Rhone Alpes, Montbonnot St Martin, France
[2] Univ Grenoble Alpes, GIPSA Lab, Grenoble, France
[3] Univ Trento, Trento, France
[4] Bar Ilan Univ, Fac Engn, Ramat Gan, Israel
来源
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2017年
基金
欧盟第七框架计划;
关键词
Audio source separation; speaker diarisation; local Gaussian model; NONNEGATIVE MATRIX FACTORIZATION; SPEAKER DIARIZATION; INFORMATION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present a probabilistic model for joint source separation and diarisation of multichannel convolutive speech mixtures. We build upon the framework of local Gaussian model (LGM) with non-negative matrix factorization (NMF). The diarisation is introduced as a temporal labeling of each source in the mix as active or inactive at the short-term frame level. We devise an EM algorithm in which the source separation process is aided by the diarisation state, since the latter indicates the sources actually present in the mixture. The diarisation state is tracked with a Hidden Markov Model (HMM) with emission probabilities calculated from the estimated source signals. The proposed EM has separation performance comparable with a state-of-the-art LGM NMF method, while outperforming a state-of-the-art speaker diarisation pipeline.
引用
收藏
页码:16 / 20
页数:5
相关论文
共 50 条
  • [1] Multichannel blind deconvolution for source separation in convolutive mixtures of speech
    Kokkinakis, K
    Nandi, AK
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01): : 200 - 212
  • [2] Blind source separation algorithm for convolutive speech mixtures using joint block-diagonalization
    Xu, Shun
    Chen, Shao-Rong
    Liu, Yu-Lin
    Zhendong yu Chongji/Journal of Vibration and Shock, 2007, 26 (08): : 86 - 90
  • [3] Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation
    Ozerov, Alexey
    Fevotte, Cedric
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (03): : 550 - 563
  • [4] Subband based blind source separation for convolutive mixtures of speech
    Araki, S
    Makino, S
    Aichner, R
    Nishikawa, T
    Saruwatari, H
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO AND ELECTROACOUSTICS MULTIMEDIA SIGNAL PROCESSING, 2003, : 509 - 512
  • [5] Blind source separation of convolutive mixtures of speech in frequency domain
    Makino, S
    Sawada, H
    Mukai, R
    Araki, S
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2005, E88A (07) : 1640 - 1655
  • [6] Solving the indeterminations of blind source separation of convolutive speech mixtures
    Rivet, B
    Girin, L
    Jutten, C
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 533 - 536
  • [7] A new perceptual convolutive blind source separation algorithm for speech separation
    Pan, QF
    Aboulnasr, T
    2004 7TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS 1-3, 2004, : 323 - 326
  • [8] AN EM ALGORITHM FOR AUDIO SOURCE SEPARATION BASED ON THE CONVOLUTIVE TRANSFER FUNCTION
    Li, Xiaofei
    Girin, Laurent
    Horaud, Radu
    2017 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2017, : 56 - 60
  • [9] A Blind Source Separation Approach Based on IVA for Convolutive Speech Mixtures
    Jan, Tariqullah
    Zafar, Haseeb
    Khalil, Ruhulamin
    Ashraf, Majid
    2016 8TH COMPUTER SCIENCE AND ELECTRONIC ENGINEERING CONFERENCE (CEEC), 2016, : 140 - 145
  • [10] A Variational EM Algorithm for the Separation of Time-Varying Convolutive Audio Mixtures
    Kounades-Bastian, Dionyssos
    Girin, Laurent
    Alameda-Pineda, Xavier
    Gannot, Sharon
    Horaud, Radu
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (08) : 1408 - 1423