A Pitch State Dependent Dictionary Design Method for Single-Channel Speech Separation

被引：0

作者：

Guo, Haiyan ^{[1
,2
]}

Yang, Zhen ^{[1
]}

Zhang, Linghua ^{[1
]}

Ye, Lei ^{[1
]}

机构：

[1] Nanjing Univ Posts & Telecommun, Minist Educ, Key Lab Broadband Wireless Commun & Sensor Networ, Nanjing, Jiangsu, Peoples R China

[2] Southeast Univ, Sch Informat Sci & Engn, Nanjing, Jiangsu, Peoples R China

来源：

2016 8TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS & SIGNAL PROCESSING (WCSP) | 2016年

关键词：

speech separation; sparse decomposition; dictionary; data mining;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this paper, we propose to design a new pitch state dependent dictionary to perform single-channel speech separation. The dictionary is designed in two stages, which are sub-dictionary learning and sub-dictionary concatenation. In sub-dictionary learning, pitch state information is taken into account to learn a set of discriminative sub-dictionaries for each speaker in time-domain. To be specific, each sub-dictionary is generated as a matrix composed of the speaker's training frames of similar pitch states as columns. Moreover, we utilize a frequent pattern mining method to further reduce the sub-dictionary size. In sub-dictionary concatenation, we propose to select an appropriate weight pair to match the learned sub-dictionaries to generate a dictionary for separation. Experimental results show that the proposed method achieves better overall performance than two dictionary-based methods and a source-filter-based method also using pitch information.

引用

页数：5

共 27 条

[1] K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation [J].

Aharon, Michal ;

Elad, Michael ;

Bruckstein, Alfred .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2006, 54 (11) :4311-4322

[2]

[Anonymous], 1995, Speech coding and synthesis

[3] Learning a Discriminative Dictionary for Single-Channel Speech Separation [J].

Bao, Guangzhao ;

Xu, Yangfei ;

Ye, Zhongfu .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (07) :1130-1138

[4] A Compressed Sensing Approach to Blind Separation of Speech Mixture Based on a Two-Layer Sparsity Model [J].

Bao, Guangzhao ;

Ye, Zhongfu ;

Xu, Xu ;

Zhou, Yingyue .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (05) :899-906

[5] Audio source separation with a single sensor [J].

Benaroya, L ;

Bimbot, F ;

Gribonval, R .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01) :191-199

[6]

Benaroya L, 2003, 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL VI, PROCEEDINGS, P613

[7] An audio-visual corpus for speech perception and automatic speech recognition (L) [J].

Cooke, Martin ;

Barker, Jon ;

Cunningham, Stuart ;

Shao, Xu .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (05) :2421-2424

[8]

Han J, 2012, MOR KAUF D, P1

[9]

Jang G.-J., 2003, J MACH LEARN RES, V4, P1365

[10] Learning the parts of objects by non-negative matrix factorization [J].

Lee, DD ;

Seung, HS .

NATURE, 1999, 401 (6755) :788-791

← 1 2 3 →