A Hybrid Parallel Computing Architecture Based on CNN and Transformer for Music Genre Classification

被引:1
作者
Chen, Jiyang [1 ,2 ]
Ma, Xiaohong [2 ]
Li, Shikuan [2 ]
Ma, Sile [1 ]
Zhang, Zhizheng [1 ]
Ma, Xiaojing [1 ]
机构
[1] Shandong Univ, Inst Marine Sci & Technol, Qingdao 266237, Peoples R China
[2] Shandong Zhengzhong Informat Technol Co Ltd, Jinan 250098, Peoples R China
关键词
music genre classification; convolutional neural networks; Transformer encoder; mel spectrogram; NEURAL-NETWORK;
D O I
10.3390/electronics13163313
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Music genre classification (MGC) is the basis for the efficient organization, retrieval, and recommendation of music resources, so it has important research value. Convolutional neural networks (CNNs) have been widely used in MGC and achieved excellent results. However, CNNs cannot model global features well due to the influence of the local receptive field; these global features are crucial for classifying music signals with temporal properties. Transformers can capture long-range dependencies within an image thanks to adopting the self-attention mechanism. Nevertheless, there are still performance and computational cost gaps between Transformers and existing CNNs. In this paper, we propose a hybrid architecture (CNN-TE) based on CNN and Transformer encoder for MGC. Specifically, we convert the audio signals into mel spectrograms and feed them into a hybrid model for training. Our model employs a CNN to initially capture low-level and localized features from the spectrogram. Subsequently, these features are processed by a Transformer encoder, which models them globally to extract high-level and abstract semantic information. This refined information is then classified using a multi-layer perceptron. Our experiments demonstrate that this approach surpasses many existing CNN architectures when tested on the GTZAN and FMA datasets. Notably, it achieves these results with fewer parameters and a faster inference speed.
引用
收藏
页数:13
相关论文
共 33 条
  • [1] Machine Learning for Music Genre Classification Using Visual Mel Spectrum
    Cheng, Yu-Huei
    Kuo, Che-Nan
    [J]. MATHEMATICS, 2022, 10 (23)
  • [2] Convolutional Neural Networks Approach for Music Genre Classification
    Cheng, Yu-Huei
    Chang, Pang-Ching
    Kuo, Che-Nan
    [J]. 2020 INTERNATIONAL SYMPOSIUM ON COMPUTER, CONSUMER AND CONTROL (IS3C 2020), 2021, : 399 - 403
  • [3] Choi K, 2017, INT CONF ACOUST SPEE, P2392, DOI 10.1109/ICASSP.2017.7952585
  • [4] Deepak S., 2020, Proceedings of Second International Conference on Inventive Research in Computing Applications (ICIRCA 2020), P985, DOI 10.1109/ICIRCA48905.2020.9182850
  • [5] Defferrard M, 2017, Arxiv, DOI [arXiv:1612.01840, DOI 10.48550/ARXIV.1612.01840]
  • [6] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
  • [7] A Survey of Audio-Based Music Classification and Annotation
    Fu, Zhouyu
    Lu, Guojun
    Ting, Kai Ming
    Zhang, Dengsheng
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2011, 13 (02) : 303 - 319
  • [8] Deep Learning Approaches in Topics of Singing Information Processing
    Gupta, Chitralekha
    Li, Haizhou
    Goto, Masataka
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2422 - 2451
  • [9] A Survey on Vision Transformer
    Han, Kai
    Wang, Yunhe
    Chen, Hanting
    Chen, Xinghao
    Guo, Jianyuan
    Liu, Zhenhua
    Tang, Yehui
    Xiao, An
    Xu, Chunjing
    Xu, Yixing
    Yang, Zhaohui
    Zhang, Yiman
    Tao, Dacheng
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) : 87 - 110
  • [10] Searching for MobileNetV3
    Howard, Andrew
    Sandler, Mark
    Chu, Grace
    Chen, Liang-Chieh
    Chen, Bo
    Tan, Mingxing
    Wang, Weijun
    Zhu, Yukun
    Pang, Ruoming
    Vasudevan, Vijay
    Le, Quoc V.
    Adam, Hartwig
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 1314 - 1324