Spectro-temporal Modulation Based Singing Detection Combined with Pitch based Grouping for Singing Voice Separation

被引:0
作者
Lin, Tse-En [1 ]
Hsu, Chung-Chien [1 ]
Chen, Yi-Cheng [2 ]
Chen, Jian-Hueng [2 ]
Chi, Tai-Shih [1 ]
机构
[1] Natl Chiao Tung Univ, Dept Elect & Comp Engn, Hsinchu 30050, Taiwan
[2] Chunghwa Telecom Co Ltd, Telecommun Labs, Taipei, Taiwan
来源
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年
关键词
singing voice detection; singing voice separation; spectro-temporal modulation; pitch tracking; MONAURAL RECORDINGS; SPEECH;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A spectro-temporal modulation based singing voice detection cascaded with a Viterbi based pitch tracking algorithm is proposed in this paper for singing-voice separation from monaural recordings. To detect the singing voice, the spectrotemporal modulation energy related to voice harmonics is extracted using a spectro-temporal modulation analysis framework developed for the Fourier spectrogram. Separation of singing -voice from background music is conducted using a binary mask to group estimated harmonics of singing voice. The proposed system is evaluated using MIR-1K dataset and is shown outperforming three other binary-mask based systems in the vocal/music separation task.
引用
收藏
页码:2919 / 2922
页数:4
相关论文
共 20 条
[1]  
[Anonymous], 2007, P 8 INT C MUSIC INFO
[2]  
[Anonymous], 2005, ISMIR
[3]   Locating singing voice segments within music signals [J].
Berenzweig, AL ;
Ellis, DPW .
PROCEEDINGS OF THE 2001 IEEE WORKSHOP ON THE APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, 2001, :119-122
[4]  
Bregman A., 1990, Auditory Scene Analysis: The Perceptual Organization of Sound, DOI DOI 10.7551/MITPRESS/1486.001.0001
[5]   COMPUTATIONAL AUDITORY SCENE ANALYSIS [J].
BROWN, GJ ;
COOKE, M .
COMPUTER SPEECH AND LANGUAGE, 1994, 8 (04) :297-336
[6]   Multiresolution spectrotemporal analysis of complex sounds [J].
Chi, T ;
Ru, PW ;
Shamma, SA .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2005, 118 (02) :887-906
[7]   Multiband analysis and synthesis of spectro-temporal modulations of Fourier spectrogram [J].
Chi, Tai-Shih ;
Hsu, Chung-Chien .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2011, 129 (05) :E190-E196
[8]   Source/Filter Model for Unsupervised Main Melody Extraction From Polyphonic Audio Signals [J].
Durrieu, Jean-Louis ;
Richard, Gael ;
David, Bertrand ;
Fevotte, Cedric .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (03) :564-575
[9]   LyricSynchronizer: Automatic Synchronization System Between Musical Audio Signals and Lyrics [J].
Fujihara, Hiromasa ;
Goto, Masataka ;
Ogata, Jun ;
Okuno, Hiroshi G. .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2011, 5 (06) :1252-1261
[10]   On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset [J].
Hsu, Chao-Ling ;
Jang, Jyh-Shing Roger .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (02) :310-319