Robustness of musical features on deep learning models for music genre classification

被引：28

作者：

Singh, Yeshwant ^{[1
]}

Biswas, Anupam ^{[1
]}

机构：

[1] Natl Inst Technol Silchar, Dept Comp Sci & Engn, Silchar 788010, Assam, India

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2022年 / 199卷

关键词：

Musical features; Music genre classification; Music information retrieval; Deep learning; RECOGNITION;

D O I：

10.1016/j.eswa.2022.116879

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Music information retrieval (MIR) has witnessed rapid advances in various tasks like musical similarity,music genre classification (MGC), etc. MGC and audio tagging are approached using various features throughtraditional machine learning and deep learning (DL) based techniques by many researchers. DL-based modelsrequire a large amount of data to generalize well on new data samples. Unfortunately, the lack of sizeable openmusic datasets makes the analyses of the robustness of musical features on DL models even more necessary. So,this paper assesses and compares the robustness of some commonly used musical and non-musical features onDL models for the MGC task by evaluating the performance of selected models on multiple employed featuresextracted from various datasets accounting for billions of segmented data samples. In our evaluation, Mel-Scalebased features and Swaragram showed high robustness across the datasets over various DL models for the MGCtask.

引用

页数：14

共 82 条

[1]

Aggarwal C. C., 2018, Neural networks and deep learning, P497, DOI [10.1007/978-3-319-94463-0, DOI 10.1007/978-3-319-94463-03]

[2]

Akkermans V., 2009, P SOUND MUS COMP C S, P143

[3] SHORT-TERM SPECTRAL ANALYSIS, SYNTHESIS, AND MODIFICATION BY DISCRETE FOURIER-TRANSFORM [J].

ALLEN, JB .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1977, 25 (03) :235-238

[4]

[Anonymous], 2021, OPEN MUSIC ENCY

[5]

Aryafar Kamelia., 2011, Proceedings of the 1st international ACM workshop on Music information retrieval with user-centered and multimodal strategies, P33

[6]

Becchetti C., 2008, Speech recognition: Theory and C++ implementation (with CD)

[7]

Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401

[8] Environmental sound classification with dilated convolutions [J].

Chen, Yan ;

Guo, Qian ;

Liang, Xinyan ;

Wang, Jiang ;

Qian, Yuhua .

APPLIED ACOUSTICS, 2019, 148 :123-132

[9]

Choi K, 2017, ARXIV170309179

[10]

Choi K, 2017, INT CONF ACOUST SPEE, P2392, DOI 10.1109/ICASSP.2017.7952585

← 1 2 3 4 5 6 7 8 9 →