Spectro-temporal Modulation Based Singing Detection Combined with Pitch based Grouping for Singing Voice Separation

被引：0

作者：

Lin, Tse-En ^{[1
]}

Hsu, Chung-Chien ^{[1
]}

Chen, Yi-Cheng ^{[2
]}

Chen, Jian-Hueng ^{[2
]}

Chi, Tai-Shih ^{[1
]}

机构：

[1] Natl Chiao Tung Univ, Dept Elect & Comp Engn, Hsinchu 30050, Taiwan

[2] Chunghwa Telecom Co Ltd, Telecommun Labs, Taipei, Taiwan

来源：

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年

关键词：

singing voice detection; singing voice separation; spectro-temporal modulation; pitch tracking; MONAURAL RECORDINGS; SPEECH;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A spectro-temporal modulation based singing voice detection cascaded with a Viterbi based pitch tracking algorithm is proposed in this paper for singing-voice separation from monaural recordings. To detect the singing voice, the spectrotemporal modulation energy related to voice harmonics is extracted using a spectro-temporal modulation analysis framework developed for the Fourier spectrogram. Separation of singing -voice from background music is conducted using a binary mask to group estimated harmonics of singing voice. The proposed system is evaluated using MIR-1K dataset and is shown outperforming three other binary-mask based systems in the vocal/music separation task.

引用

页码：2919 / 2922

页数：4

共 20 条

[1]

[Anonymous], 2007, P 8 INT C MUSIC INFO

[2]

[Anonymous], 2005, ISMIR

[3] Locating singing voice segments within music signals [J].

Berenzweig, AL ;

Ellis, DPW .

PROCEEDINGS OF THE 2001 IEEE WORKSHOP ON THE APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, 2001, :119-122

[4]

Bregman A., 1990, Auditory Scene Analysis: The Perceptual Organization of Sound, DOI DOI 10.7551/MITPRESS/1486.001.0001

[5] COMPUTATIONAL AUDITORY SCENE ANALYSIS [J].

BROWN, GJ ;

COOKE, M .

COMPUTER SPEECH AND LANGUAGE, 1994, 8 (04) :297-336

[6] Multiresolution spectrotemporal analysis of complex sounds [J].

Chi, T ;

Ru, PW ;

Shamma, SA .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2005, 118 (02) :887-906

[7] Multiband analysis and synthesis of spectro-temporal modulations of Fourier spectrogram [J].

Chi, Tai-Shih ;

Hsu, Chung-Chien .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2011, 129 (05) :E190-E196

[8] Source/Filter Model for Unsupervised Main Melody Extraction From Polyphonic Audio Signals [J].

Durrieu, Jean-Louis ;

Richard, Gael ;

David, Bertrand ;

Fevotte, Cedric .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (03) :564-575

[9] LyricSynchronizer: Automatic Synchronization System Between Musical Audio Signals and Lyrics [J].

Fujihara, Hiromasa ;

Goto, Masataka ;

Ogata, Jun ;

Okuno, Hiroshi G. .

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2011, 5 (06) :1252-1261

[10] On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset [J].

Hsu, Chao-Ling ;

Jang, Jyh-Shing Roger .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (02) :310-319

← 1 2 →