Comparing Deep Models and Evaluation Strategies for Multi-Pitch Estimation in Music Recordings

被引:4
|
作者
Weiss, Christof [1 ]
Peeters, Geoffroy [1 ]
机构
[1] Inst Polytech Paris, Telecom Paris, LTCI, F-91764 Palaiseau, France
关键词
Task analysis; Multiple signal classification; Instruments; Estimation; Training; Annotations; Speech processing; Music information retrieval; music transcription; U-net; generalization; cross-version evaluation; TRANSCRIPTION;
D O I
10.1109/TASLP.2022.3200547
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Extracting pitch information from music recordings is a challenging but important problem in music signal processing. Frame-wise transcription or multi-pitch estimation aims for detecting the simultaneous activity of pitches in polyphonic music recordings and has recently seen major improvements thanks to deep-learning techniques, with a variety of proposed model architectures. In this paper, we compare different architectures based on convolutional neural networks, the U-net structure, and self-attention components. We propose several modifications to these architectures including self-attention modules for skip connections, recurrent layers to replace the self-attention, and a multi-task strategy with simultaneous prediction of the degree of polyphony. We compare variants of these architectures in different sizes for multi-pitch estimation, focusing on Western classical music beyond the piano-solo scenario using the MusicNet and Schubert Winterreise datasets. Our experiments indicate that most architectures yield competitive results and that larger model variants seem to be beneficial. However, we find that these results substantially depend on randomization effects and the particular choice of the training-test split, which questions the claim of superiority for particular architectures given only small improvements. We therefore investigate the influence of dataset splits in the presence of several movements of a work cycle (cross-version evaluation) and propose a best-practice evaluation strategy for MusicNet, which weakens the influence of individual test tracks and suppresses overfitting to specific works and recording conditions. A final cross-dataset evaluation suggests that improvements on one specific dataset do not necessarily generalize to other scenarios, thus emphasizing the need for further high-quality multi-pitch datasets in order to reliably measure progress in music transcription tasks.
引用
收藏
页码:2814 / 2827
页数:14
相关论文
共 50 条
  • [1] Multi-pitch estimation using harmonic music
    Christensen, Mads Graesboll
    Jakobsson, Andreas
    Jensen, Soren Holdt
    2006 FORTIETH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1-5, 2006, : 521 - +
  • [2] A Music Cognition–Guided Framework for Multi-pitch Estimation
    Xiaoquan Li
    Yijun Yan
    John Soraghan
    Zheng Wang
    Jinchang Ren
    Cognitive Computation, 2023, 15 : 23 - 35
  • [3] Multi-pitch estimation
    Christensen, Mads Graesboll
    Stoica, Petre
    Jakobsson, Andreas
    Jensen, Soren Holdt
    SIGNAL PROCESSING, 2008, 88 (04) : 972 - 983
  • [4] A Music Cognition-Guided Framework for Multi-pitch Estimation
    Li, Xiaoquan
    Yan, Yijun
    Soraghan, John
    Wang, Zheng
    Ren, Jinchang
    COGNITIVE COMPUTATION, 2023, 15 (01) : 23 - 35
  • [5] MULTI-PITCH ESTIMATION OF AUDIO RECORDINGS USING A CODEBOOK-BASED APPROACH
    Hansen, Martin Weiss
    Jensen, Jesper Rindom
    Christensen, Mads Graesboll
    2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2016, : 983 - 987
  • [6] MULTI-PITCH ESTIMATION OF INHARMONIC SIGNALS
    Nilsson, Tommy
    Adalbjornsson, Stefan I.
    Butt, Naveed R.
    Jakobsson, Andreas
    2013 PROCEEDINGS OF THE 21ST EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2013,
  • [7] MULTI-PITCH ESTIMATION USING SEMIDEFINITE PROGRAMMING
    Jensen, Tobias Lindstrom
    Vandenberghe, Lieven
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4192 - 4196
  • [8] AN ADAPTIVE PENALTY APPROACH TO MULTI-PITCH ESTIMATION
    Kronvall, Ted
    Elvander, Filip
    Adalbjornsson, Stefan Ingi
    Jakobsson, Andreas
    2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 31 - 35
  • [9] Multi-pitch estimation for polyphonic musical signals
    Fernandez-Cid, P
    Casajus-Quiros, FJ
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 3565 - 3568
  • [10] Multi-pitch estimation with polyphony per instrument information for Western classical and electronic music
    Michael Taenzer
    EURASIP Journal on Audio, Speech, and Music Processing, 2025 (1)