A Deep Neural Network for Modeling Music

被引:15
作者
Zhang, Pengjing
Zheng, Xiaoqing [1 ]
Zhang, Wenqiang
Li, Siyan
Qian, Sheng
He, Wenqi
Zhang, Shangtong
Wang, Ziyuan
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
来源
ICMR'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL | 2015年
关键词
Music classification; feature learning; neural network; INFORMATION-RETRIEVAL;
D O I
10.1145/2671188.2749367
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a convolutional neural network architecture with k-max pooling layer for semantic modeling of music. The aim of a music model is to analyze and represent the semantic content of music for purposes of classification, discovery, or clustering. The k-max pooling layer is used in the network to make it possible to pool the k most active features, capturing the semantic-rich and time-varying information about music. Our network takes an input music as a sequence of audio words, where each audio word is associated with a distributed feature vector that can be fine-tuned by backpropagating errors during the training. The architecture allows us to take advantage of the better trained audio word embeddings and the deep structures to produce more robust music representations. Experiment results with two different music collections show that our neural networks achieved the best accuracy in music genre classification comparing with three state-of-art systems.
引用
收藏
页码:379 / 386
页数:8
相关论文
共 43 条
  • [1] [Anonymous], INT C DIG AUD EFF DA
  • [2] [Anonymous], INT SOC MUSIC INFORM
  • [3] [Anonymous], 2009, NEURAL INFORM PROCES
  • [4] [Anonymous], INT SOC MUSIC INFORM
  • [5] [Anonymous], INT S MUS INF RETR
  • [6] [Anonymous], INT C MACH LEARN APP
  • [7] [Anonymous], INT C DIG AUD EFF DA
  • [8] [Anonymous], INT C MACH LEARN ICM
  • [9] [Anonymous], INT C MULT RETR ICMR
  • [10] [Anonymous], INT SOC MUSIC INFORM