A Tandem Algorithm for Singing Pitch Extraction and Voice Separation From Music Accompaniment

被引:44
作者
Hsu, Chao-Ling [1 ]
Wang, DeLiang [2 ,3 ]
Jang, Jyh-Shing Roger [4 ]
Hu, Ke [2 ,3 ]
机构
[1] Mediatek Inc, Hsinchu 30078, Taiwan
[2] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[3] Ohio State Univ, Ctr Cognit Sci, Columbus, OH 43210 USA
[4] Natl Tsing Hua Univ, Dept Comp Sci, Hsinchu 30013, Taiwan
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2012年 / 20卷 / 05期
关键词
Computational auditory scene analysis (CASA); iterative procedure; pitch extraction; singing voice separation; tandem algorithm; SPEECH; MODELS; MELODY;
D O I
10.1109/TASL.2011.2182510
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Singing pitch estimation and singing voice separation are challenging due to the presence of music accompaniments that are often nonstationary and harmonic. Inspired by computational auditory scene analysis (CASA), this paper investigates a tandem algorithm that estimates the singing pitch and separates the singing voice jointly and iteratively. Rough pitches are first estimated and then used to separate the target singer by considering harmonicity and temporal continuity. The separated singing voice and estimated pitches are used to improve each other iteratively. To enhance the performance of the tandem algorithm for dealing with musical recordings, we propose a trend estimation algorithm to detect the pitch ranges of a singing voice in each time frame. The detected trend substantially reduces the difficulty of singing pitch detection by removing a large number of wrong pitch candidates either produced by musical instruments or the overtones of the singing voice. Systematic evaluation shows that the tandem algorithm outperforms previous systems for pitch extraction and singing voice separation.
引用
收藏
页码:1482 / 1491
页数:10
相关论文
共 30 条
[1]   Sinusoidal model based on instantaneous frequency attractors [J].
Abe, T ;
Honda, M .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (04) :1292-1300
[2]  
[Anonymous], 2006, Computational Auditory Scene Analysis: Principles, Algorithms, and Applications
[3]  
[Anonymous], 1988, 2341 MRC APPL PSYCH
[4]  
[Anonymous], 2007, P INT S FRONT RES SP
[5]  
[Anonymous], 2005, ISMIR
[6]  
Bregman A., 1990, Auditory Scene Analysis: The Perceptual Organization of Sound, DOI DOI 10.7551/MITPRESS/1486.001.0001
[7]   COMPUTATIONAL AUDITORY SCENE ANALYSIS [J].
BROWN, GJ ;
COOKE, M .
COMPUTER SPEECH AND LANGUAGE, 1994, 8 (04) :297-336
[8]  
Dressler K., 2006, P 9 INT C DIGITAL AU, P247
[9]   Singer melody extraction in polyphonic signals using source separation methods [J].
Durrieu, Jean-Louis ;
Richard, Gael ;
David, Bertrand .
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, :169-172
[10]  
Fujihara H, 2006, IEEE INT SYM MULTIM, P257