AN EM ALGORITHM FOR JOINT SOURCE SEPARATION AND DIARISATION OF MULTICHANNEL CONVOLUTIVE SPEECH MIXTURES

被引:0
作者
Kounades-Bastian, Dionyssos [1 ]
Girin, Laurent [1 ,2 ]
Alameda-Pineda, Xavier [3 ]
Gannot, Sharon [4 ]
Horaud, Radu [1 ]
机构
[1] INRIA Grenoble Rhone Alpes, Montbonnot St Martin, France
[2] Univ Grenoble Alpes, GIPSA Lab, Grenoble, France
[3] Univ Trento, Trento, France
[4] Bar Ilan Univ, Fac Engn, Ramat Gan, Israel
来源
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2017年
基金
欧盟第七框架计划;
关键词
Audio source separation; speaker diarisation; local Gaussian model; NONNEGATIVE MATRIX FACTORIZATION; SPEAKER DIARIZATION; INFORMATION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present a probabilistic model for joint source separation and diarisation of multichannel convolutive speech mixtures. We build upon the framework of local Gaussian model (LGM) with non-negative matrix factorization (NMF). The diarisation is introduced as a temporal labeling of each source in the mix as active or inactive at the short-term frame level. We devise an EM algorithm in which the source separation process is aided by the diarisation state, since the latter indicates the sources actually present in the mixture. The diarisation state is tracked with a Hidden Markov Model (HMM) with emission probabilities calculated from the estimated source signals. The proposed EM has separation performance comparable with a state-of-the-art LGM NMF method, while outperforming a state-of-the-art speaker diarisation pipeline.
引用
收藏
页码:16 / 20
页数:5
相关论文
共 26 条
  • [21] Source Separation of Convolutive and Noisy Mixtures Using Audio-Visual Dictionary Learning and Probabilistic Time-Frequency Masking
    Liu, Qingju
    Wang, Wenwu
    Jackson, Philip J. B.
    Barnard, Mark
    Kittler, Josef
    Chambers, Jonathon
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2013, 61 (22) : 5520 - 5535
  • [22] Generalized Fast Multichannel Nonnegative Matrix Factorization Based on Gaussian Scale Mixtures for Blind Source Separation
    Fontaine, Mathieu
    Sekiguchi, Kouhei
    Nugraha, Aditya Arie
    Bando, Yoshiaki
    Yoshii, Kazuyoshi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1734 - 1748
  • [23] Combating Reverberation in NTF-based Speech Separation Using a Sub-Source Weighted Multichannel Wiener Filter and Linear Prediction
    Fras, Mieszko
    Witkowski, Marcin
    Kowalczyk, Konrad
    INTERSPEECH 2021, 2021, : 3895 - 3899
  • [24] JOINT TRAINING OF DEEP NEURAL NETWORKS FOR MULTI-CHANNEL DEREVERBERATION AND SPEECH SOURCE SEPARATION
    Togami, Masahito
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3032 - 3036
  • [25] A Joint Diagonalization Based Efficient Approach to Underdetermined Blind Audio Source Separation Using the Multichannel Wiener Filter
    Ito, Nobutaka
    Ikeshita, Rintaro
    Sawada, Hiroshi
    Nakatani, Tomohiro
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1950 - 1965
  • [26] Joint Blind Source Separation and Dereverberation for Automatic Speech Recognition using Delayed-Subsource MNMF with Localization Prior
    Fras, Mieszko
    Witkowski, Marcin
    Kowalczyk, Konrad
    INTERSPEECH 2023, 2023, : 3734 - 3738