DISTRIBUTED MICROPHONE ARRAY PROCESSING FOR SPEECH SOURCE SEPARATION WITH CLASSIFIER FUSION

被引：0

作者：

Souden, Mehrez ^{[1
]}

Kinoshita, Keisuke ^{[1
]}

Delcroix, Marc ^{[1
]}

Nakatani, Tomohiro ^{[1
]}

机构：

[1] NTT Commun Sci Labs, Kyoto, Japan

来源：

2012 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP) | 2012年

关键词：

Distributed microphone array processing; blind source separation; speech clustering; classifier combination; BLIND; MIXTURES;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We propose a new approach for clustering and separating competing speech signals using a distributed microphone array (DMA). This approach can be viewed as an extension of expectation-maximization (EM)-based source separation to DMAs. To achieve distributed processing, we assume the conditional independence (with respect to sources' activities) of the normalized recordings of different nodes. By doing so, only the posterior probabilities of sources' activities need to be shared between nodes. Consequently, the EM algorithm is formulated such that at the expectation step, local posterior probabilities are estimated locally and shared between nodes. In the maximization step, every node fuses the received probabilities via either product or sum rules and estimates its local parameters. We show that, even if we make binary decisions (presence/absence of speech) during EM iterations instead of transmitting continuous posterior probability values, we can achieve separation without causing significant speech distortion. Our preliminary investigations demonstrate that the proposed processing technique approaches the centralized solution and can outperform Oracle best node-wise clustering in terms of objective source separation metrics.

引用

页数：6

共 20 条

[11] Huang Y., 2006, ACOUSTIC MIMO SIGNAL
[12] On combining classifiers
Kittler, J
Hatef, M
Duin, RPW
Matas, J
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1998, 20 (03) : 226 - 239
[13] Nakatani T, 2012, INT CONF ACOUST SPEE, P4029, DOI 10.1109/ICASSP.2012.6288802
[14] Ono Nobutaka, 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), P161, DOI 10.1109/ASPAA.2009.5346505
[15] Sawada H, 2007, 2007 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, P157
[16] Underdetermined Convolutive Blind Source Separation via Frequency Bin-Wise Clustering and Permutation Alignment
Sawada, Hiroshi
Araki, Shoko
Makino, Shoji
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (03): : 516 - 527
[17] Souden M, 2012, INT CONF ACOUST SPEE, P109, DOI 10.1109/ICASSP.2012.6287829
[18] An Integrated Solution for Online Multichannel Noise Tracking and Reduction
Souden, Mehrez
Chen, Jingdong
Benesty, Jacob
Affes, Sofiene
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (07): : 2159 - 2169
[19] Performance measurement in blind audio source separation
Vincent, Emmanuel
Gribonval, Remi
Févotte, Cedric
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (04): : 1462 - 1469
[20] Blind separation of speech mixtures via time-frequency masking
Yilmaz, Ö
Rickard, S
[J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2004, 52 (07) : 1830 - 1847

← 1 2 →