Solos: A Dataset for Audio-Visual Music Analysis

被引:2
作者
Montesinos, Juan F. [1 ]
Slizovskaia, Olga [1 ]
Haro, Gloria [1 ]
机构
[1] Univ Pompeu Fabra, Dept Informat & Commun Technol, Barcelona, Spain
来源
2020 IEEE 22ND INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP) | 2020年
基金
欧盟地平线“2020”;
关键词
audio-visual; dataset; multimodal; music; SOURCE SEPARATION; AUDIO;
D O I
10.1109/mmsp48831.2020.9287124
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In this paper, we present a new dataset of music performance videos which can be used for training machine learning methods for multiple tasks such as audio-visual blind source separation and localization, cross-modal correspondences, cross-modal generation and, in general, any audio-visual self-supervised task. These videos, gathered from YouTube, consist of solo musical performances of 13 different instruments. Compared to previously proposed audio-visual datasets, Solos is cleaner since a big amount of its recordings are auditions and manually checked recordings, ensuring there is no background noise nor effects added in the video post-processing. Besides, it is, up to the best of our knowledge, the only dataset that contains the whole set of instruments present in the URMP [1] dataset, a high-quality dataset of 44 audio-visual recordings of multi-instrument classical music pieces with individual audio tracks. URMP was intented to be used for source separation, thus, we evaluate the performance on the URMP dataset of two different source-separation models trained on Solos. The dataset is publicly available at https://juanfmontesinos.github.io/Solos/
引用
收藏
页数:6
相关论文
共 43 条
[1]  
[Anonymous], P IEEE C COMP VIS PA
[2]  
[Anonymous], 2017, P SOUND MUS COMP
[3]  
[Anonymous], 2015, ACS SYM SER
[4]  
Arandjelovic R., 2018, LECT NOTES COMPUT SC, DOI DOI 10.1007/978-3-030-01246-5_27
[5]   OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields [J].
Cao, Zhe ;
Hidalgo, Gines ;
Simon, Tomas ;
Wei, Shih-En ;
Sheikh, Yaser .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (01) :172-186
[6]   Monoaural Audio Source Separation Using Deep Convolutional Neural Networks [J].
Chandna, Pritish ;
Miron, Marius ;
Janer, Jordi ;
Gomez, Emilia .
LATENT VARIABLE ANALYSIS AND SIGNAL SEPARATION (LVA/ICA 2017), 2017, 10169 :258-266
[7]   Deep Cross-Modal Audio-Visual Generation [J].
Chen, Lele ;
Srivastava, Sudhanshu ;
Duan, Zhiyao ;
Xu, Chenliang .
PROCEEDINGS OF THE THEMATIC WORKSHOPS OF ACM MULTIMEDIA 2017 (THEMATIC WORKSHOPS'17), 2017, :349-357
[9]  
Darrell T, 2000, LECT NOTES COMPUT SC, V1948, P32
[10]  
Dixon S, 2018, P ISMIR 2018