Melody extraction from music using modified group delay functions

被引:4
作者
Rajan R. [1 ]
Misra M. [2 ]
Murthy H.A. [1 ]
机构
[1] Department of Computer Science and Engineering, Indian Institute of Technology, Madras, Chennai
[2] Center for Computer Research in Music and Acoustics, Stanford University, Stanford, CA
关键词
Group delay; Modified group delay-source; Modified group delay-system; Pitch extraction for music;
D O I
10.1007/s10772-017-9397-1
中图分类号
学科分类号
摘要
Modified group delay based algorithms for estimation of melodic pitch sequences from heterphonic/polyphonic music are discussed in this paper. Two different variants of the modified group delay function are proposed, namely, (a) system based—MODGD (Direct) and (b) source based—MODGD (Source). In (a) the standard modified group delay function (MODGDF) is used to estimate prominent melodic pitch (f0), which appears like a low frequency formant in the MODGDF spectrum. In (b), the power spectrum of the signal is first flattened to emphasise the source. The flattened power spectrum behaves like a sinusoid in noise, the frequency of the sinusoid being related to the pitch frequency. The modified group delay function of this signal produces peaks at T0, 2 T0, … , where T0=1f0. Continuity constraints in a dynamic programming framework are imposed across frames to reduce octave errors. Sudden changes in pitch are accommodated by changing the frame size dynamically using a multi-resolution framework. The performance of the proposed systems was evaluated on four datasets: ADC-2004, LabROSA, MIREX-2008 and Carnatic music dataset. The performance of the proposed approaches demonstrate the potential of the group delay based methods for melody extraction. © 2017, Springer Science+Business Media New York.
引用
收藏
页码:185 / 204
页数:19
相关论文
共 44 条
  • [31] Rao V., Gaddipati P., Rao P., Signal-driven window length adaptation for sinusoid detection in polyphonic music, IEEE Transactions on Audio Speech and Language Processing, 20, 1, pp. 342-348, (2012)
  • [32] Rao V., Rao P., Vocal melody extraction in the presence of pitched accompaniment in polyphonic music, IEEE Transactions on Audio, Speech, and Language Processing, 18, 8, pp. 2145-2154, (2010)
  • [33] Ryynanen M., Klapuri A., Automatic transcription of melody, base line, and chords in polyphonic music, Computer Music Journal, 32, 3, pp. 72-86, (2008)
  • [34] Salamon J., Gomez E., Melody extraction from polyphonic music signals using pitch contours characteristics, IEEE Transactions on Audio Speech and Language Processing, 20, 6, pp. 1759-1770, (2012)
  • [35] Salamon J., Gomez E., Ellis D.P.W., Richard G., Melody extraction from polyphonic music signals: Approaches, applications and challenges, IEEE Signal Processing Magazine, 31, 2, pp. 114-118, (2014)
  • [36] Sebastian J., Kumar P.A.M., Murthy H., (2016)
  • [37] Shanmugam S.A., A hybrid approach to segmentation of speech using group delay processing and HMM based embedded reestimation, In Proceedings of fifteenth annual conference of the international speech communication association (INTERSPEECH, (2014)
  • [38] Tachibana H., Ono T., Ono N., Sagayama S., Melody line estimation in homophonic music audio signals based on temporal-variability of melodic source, In Proceedings of IEEE international conference acoustics, speech, signal processing, pp. 425-428, (2010)
  • [39] Thornburg H., Detection and modeling of transient audio signals with prior information, Ph.D, (2003)
  • [40] Veldhuis R., Consistent pitch marking, In Proceedings of sixth international conference on spoken language processing, 3, pp. 207-210, (2000)