CNN-based Note Onset Detection using Synthetic Data Augmentation

被引：0

作者：

Mounir, Mina ^{[1
,3
]}

Karsmakers, Peter ^{[2
]}

van Waterschoot, Toon ^{[1
,3
]}

机构：

[1] Katholieke Univ Leuven, Dept Elect Engn, ESAT STADIUS, Leuven, Belgium

[2] Katholieke Univ Leuven, DTAI ADVISE, Dept Comp Sci, Geel Campus, Geel, Belgium

[3] Katholieke Univ Leuven, ESAT Lab, Leuven, Belgium

来源：

28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020) | 2021年

基金：

欧洲研究理事会;

关键词：

CNN; data augmentation; note onset detection;

D O I：

10.23919/eusipco47968.2020.9287621

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Detecting the onset of notes in music excerpts is a fundamental problem in many music signal processing tasks, including analysis, synthesis, and information retrieval. When addressing the note onset detection (NOD) problem using a data-driven methodology, a major challenge is the availability and quality of labeled datasets used for both model training/tuning and evaluation. As most of the available datasets are manually annotated, the amount of annotated music excerpts is limited and the annotation strategy and quality varies across data sets. To counter both problems, in this paper we propose to use semi-synthetic datasets where the music excerpts are mixes of isolated note recordings. The advantage resides in the annotations being automatically generated while mixing the notes, as isolated note onsets are straightforward to detect using a simple energy measure. A semi-synthetic dataset is used in this work for augmenting a real piano dataset when training a convolutional Neural Network (CNN) with three novel model training strategies. Training the CNN on a semi-synthetic dataset and retraining only the CNN classification layers on a real dataset results in higher average F-1-score (F-1) scores with lower variance.

引用

页码：171 / 175

页数：5

共 11 条

[1]

[Anonymous], 2018, ONS DET RES 2018

[2]

[Anonymous], INT C LEARNING REPRE

[3]

Emiya V., 2008, MAPS DATABASE A PIAN

[4] Multipitch Estimation of Piano Sounds Using a New Probabilistic Spectral Smoothness Principle [J].

Emiya, Valentin ;

Badeau, Roland ;

David, Bertrand .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06) :1643-1654

[5]

Klich I., 2005, TECHREPORT JAN

[6]

Leveau P., 2004, INT C MUSIC INF RETR, P72

[7]

Mounir M, 2019, IEEE WORK APPL SIG, P21, DOI [10.1109/WASPAA.2019.8937251, 10.1109/waspaa.2019.8937251]

[8] Local discriminative distance metrics ensemble learning [J].

Mu, Yang ;

Ding, Wei ;

Tao, Dacheng .

PATTERN RECOGNITION, 2013, 46 (08) :2337-2349

[9]

Opolko F., 2006, McGill University Master Samples

[10]

Roebel A., 2018, MIREX 2018 TRAINING

← 1 2 →