A Pitch State Dependent Dictionary Design Method for Single-Channel Speech Separation

被引:0
作者
Guo, Haiyan [1 ,2 ]
Yang, Zhen [1 ]
Zhang, Linghua [1 ]
Ye, Lei [1 ]
机构
[1] Nanjing Univ Posts & Telecommun, Minist Educ, Key Lab Broadband Wireless Commun & Sensor Networ, Nanjing, Jiangsu, Peoples R China
[2] Southeast Univ, Sch Informat Sci & Engn, Nanjing, Jiangsu, Peoples R China
来源
2016 8TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS & SIGNAL PROCESSING (WCSP) | 2016年
关键词
speech separation; sparse decomposition; dictionary; data mining;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, we propose to design a new pitch state dependent dictionary to perform single-channel speech separation. The dictionary is designed in two stages, which are sub-dictionary learning and sub-dictionary concatenation. In sub-dictionary learning, pitch state information is taken into account to learn a set of discriminative sub-dictionaries for each speaker in time-domain. To be specific, each sub-dictionary is generated as a matrix composed of the speaker's training frames of similar pitch states as columns. Moreover, we utilize a frequent pattern mining method to further reduce the sub-dictionary size. In sub-dictionary concatenation, we propose to select an appropriate weight pair to match the learned sub-dictionaries to generate a dictionary for separation. Experimental results show that the proposed method achieves better overall performance than two dictionary-based methods and a source-filter-based method also using pitch information.
引用
收藏
页数:5
相关论文
共 27 条
[1]   K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation [J].
Aharon, Michal ;
Elad, Michael ;
Bruckstein, Alfred .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2006, 54 (11) :4311-4322
[2]  
[Anonymous], 1995, Speech coding and synthesis
[3]   Learning a Discriminative Dictionary for Single-Channel Speech Separation [J].
Bao, Guangzhao ;
Xu, Yangfei ;
Ye, Zhongfu .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (07) :1130-1138
[4]   A Compressed Sensing Approach to Blind Separation of Speech Mixture Based on a Two-Layer Sparsity Model [J].
Bao, Guangzhao ;
Ye, Zhongfu ;
Xu, Xu ;
Zhou, Yingyue .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (05) :899-906
[5]   Audio source separation with a single sensor [J].
Benaroya, L ;
Bimbot, F ;
Gribonval, R .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01) :191-199
[6]  
Benaroya L, 2003, 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL VI, PROCEEDINGS, P613
[7]   An audio-visual corpus for speech perception and automatic speech recognition (L) [J].
Cooke, Martin ;
Barker, Jon ;
Cunningham, Stuart ;
Shao, Xu .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (05) :2421-2424
[8]  
Han J, 2012, MOR KAUF D, P1
[9]  
Jang G.-J., 2003, J MACH LEARN RES, V4, P1365
[10]   Learning the parts of objects by non-negative matrix factorization [J].
Lee, DD ;
Seung, HS .
NATURE, 1999, 401 (6755) :788-791