Audio-Based Music Classification with DenseNet and Data Augmentation

被引:14
作者
Bian, Wenhao [1 ,2 ]
Wang, Jie [2 ]
Zhuang, Bojin [2 ]
Yang, Jiankui [1 ]
Wang, Shaojun [2 ]
Xiao, Jing [2 ]
机构
[1] Beijing Univ Posts & Telecommn, Beijing, Peoples R China
[2] Ping An Technol Shenzhen Co Ltd, Shenzhen, Peoples R China
来源
PRICAI 2019: TRENDS IN ARTIFICIAL INTELLIGENCE, PT III | 2019年 / 11672卷
关键词
Music classification; Spectrogram; CNN; ResNet; DenseNet; Deep learning;
D O I
10.1007/978-3-030-29894-4_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, deep learning technique has received intense attention owing to its great success in image recognition. A tendency of adaption of deep learning in various information processing fields has formed, including music information retrieval (MIR). In this paper, we conduct a comprehensive study on music audio classification with improved convolutional neural networks (CNNs). To the best of our knowledge, this the first work to apply Densely Connected Convolutional Networks (DenseNet) to music audio tagging, which has been demonstrated to perform better than Residual neural network (ResNet). Additionally, two specific data augmentation approaches of time overlapping and pitch shifting have been proposed to address the deficiency of labelled data in the MIR. Moreover, an ensemble learning of stacking is employed based on SVM. We believe that the proposed combination of strong representation of DenseNet and data augmentation can be adapted to other audio processing tasks.
引用
收藏
页码:56 / 65
页数:10
相关论文
共 50 条
  • [31] Wildfire-Detection Method Using DenseNet and CycleGAN Data Augmentation-Based Remote Camera Imagery
    Park, Minsoo
    Dai Quoc Tran
    Jung, Daekyo
    Park, Seunghee
    REMOTE SENSING, 2020, 12 (22) : 1 - 16
  • [32] Conditional Generative Data Augmentation for Clinical Audio Datasets
    Seibold, Matthias
    Hoch, Armando
    Farshad, Mazda
    Navab, Nassir
    Fuernstahl, Philipp
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT VII, 2022, 13437 : 345 - 354
  • [33] Surveillance audio-based rainfall observation: An enhanced strategy for extreme rainfall observation
    Wang, Xing
    Glade, Thomas
    Schmaltz, Elmar
    Liu, Xuejun
    APPLIED ACOUSTICS, 2023, 211
  • [34] Multi-Task Learning for Audio-Based Infant Cry Detection and Reasoning
    Xia, Ming
    Huang, Dongmin
    Wang, Wenjin
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (12) : 7434 - 7446
  • [35] Enhancing Intake Monitoring: Transfer Learning for Audio-Based Detection of Swallowing Events
    Chen, Xin
    Kamavuako, Ernest
    2024 IEEE 22ND MEDITERRANEAN ELECTROTECHNICAL CONFERENCE, MELECON 2024, 2024, : 479 - 484
  • [36] A novel scene classification model combining ResNet based transfer learning and data augmentation with a filter
    Liu, Shaopeng
    Tian, Guohui
    Xu, Yuan
    NEUROCOMPUTING, 2019, 338 : 191 - 206
  • [37] DATA AUGMENTATION FOR CHEST PATHOLOGIES CLASSIFICATION
    Sirazitdinov, Ilyas
    Kholiavchenko, Maksym
    Kuleev, Ramil
    Ibragimov, Bulat
    2019 IEEE 16TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2019), 2019, : 1216 - 1219
  • [38] Data Augmentation-Based Enhancement for Efficient Network Traffic Classification
    Shin, Chang-Yui
    Choi, Yang-Seo
    Kim, Myung-Sup
    IEEE ACCESS, 2025, 13 : 6006 - 6028
  • [39] Data Augmentation for Deep Learning-Based Radio Modulation Classification
    Huang, Liang
    Pan, Weijian
    Zhang, You
    Qian, Liping
    Gao, Nan
    Wu, Yuan
    IEEE ACCESS, 2020, 8 : 1498 - 1506
  • [40] DualDiscWaveGAN-Based Data Augmentation Scheme for Animal Sound Classification
    Kim, Eunbeen
    Moon, Jaeuk
    Shim, Jonghwa
    Hwang, Eenjun
    SENSORS, 2023, 23 (04)