PERCEPTUAL MUSICAL SIMILARITY METRIC LEARNING WITH GRAPH NEURAL NETWORKS

被引:0
作者
Vahidi, Cyrus [1 ]
Singh, Shubhr [1 ]
Benetos, Emmanouil [1 ]
Phan, Huy [2 ]
Stowell, Dan [3 ]
Fazekas, Gyorgy [1 ]
Lagrange, Mathieu [4 ]
机构
[1] Queen Mary Univ London, Ctr Digital Mus, London, England
[2] Amazon Alexa, Cambridge, MA USA
[3] Tilburg Univ, Bijsterveldenlaan, Tilburg, Netherlands
[4] Nantes Univ, CNRS, Ecole Cent Nantes, LS2N, Nantes, France
来源
2023 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA | 2023年
基金
英国工程与自然科学研究理事会; 英国科研创新办公室;
关键词
auditory similarity; content-based music retrieval; graph neural networks; metric learning;
D O I
10.1109/WASPAA58266.2023.10248151
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Sound retrieval for assisted music composition depends on evaluating similarity between musical instrument sounds, which is partly influenced by playing techniques. Previous methods utilizing Euclidean nearest neighbours over acoustic features show some limitations in retrieving sounds sharing equivalent timbral properties, but potentially generated using a different instrument, playing technique, pitch or dynamic. In this paper, we present a metric learning system designed to approximate human similarity judgments between extended musical playing techniques using graph neural networks. Such structure is a natural candidate for solving similarity retrieval tasks, yet have seen little application in modelling perceptual music similarity. We optimize a Graph Convolutional Network (GCN) over acoustic features via a proxy metric learning loss to learn embeddings that reflect perceptual similarities. Specifically, we construct the graph's adjacency matrix from the acoustic data manifold with an example-wise adaptive k-nearest neighbourhood graph: Adaptive Neighbourhood Graph Neural Network (AN-GNN). Our approach achieves 96.4% retrieval accuracy compared to 38.5% with a Euclidean metric and 86.0% with a multilayer perceptron (MLP), while effectively considering retrievals from distinct playing techniques to the query example.
引用
收藏
页数:5
相关论文
共 29 条
[1]  
Agarap A F., 2018, Deep learning using rectified linear units (relu)
[2]   Joint Time-Frequency Scattering [J].
Anden, Joakim ;
Lostanlen, Vincent ;
Mallat, Stephane .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2019, 67 (14) :3704-3718
[3]  
Andreux M, 2020, J MACH LEARN RES, V21
[4]   Look, Listen and Learn [J].
Arandjelovic, Relja ;
Zisserman, Andrew .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :609-617
[5]  
Bellet A., 2015, SYNTH LECT ARTIF INT, V9, P1, DOI DOI 10.2200/S00626ED1V01Y201501AIM030
[6]  
Cella CE, 2020, Arxiv, DOI arXiv:2007.00763
[7]  
Clevert DA, 2016, Arxiv, DOI [arXiv:1511.07289, DOI 10.48550/ARXIV.1511.07289]
[8]  
Cramer J, 2019, INT CONF ACOUST SPEE, P3852, DOI 10.1109/ICASSP.2019.8682475
[9]  
Flores Garcia H., 2021, P 22 INT SOC MUS INF
[10]  
Han Kai, 2022, Adv Neural Inf Process Syst, V35, P8291