A Deep Neural Network for Modeling Music

被引：15

作者：

Zhang, Pengjing

Zheng, Xiaoqing ^{[1
]}

Zhang, Wenqiang

Li, Siyan

Qian, Sheng

He, Wenqi

Zhang, Shangtong

Wang, Ziyuan

机构：

[1] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China

来源：

ICMR'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL | 2015年

关键词：

Music classification; feature learning; neural network; INFORMATION-RETRIEVAL;

D O I：

10.1145/2671188.2749367

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a convolutional neural network architecture with k-max pooling layer for semantic modeling of music. The aim of a music model is to analyze and represent the semantic content of music for purposes of classification, discovery, or clustering. The k-max pooling layer is used in the network to make it possible to pool the k most active features, capturing the semantic-rich and time-varying information about music. Our network takes an input music as a sequence of audio words, where each audio word is associated with a distributed feature vector that can be fine-tuned by backpropagating errors during the training. The architecture allows us to take advantage of the better trained audio word embeddings and the deep structures to produce more robust music representations. Experiment results with two different music collections show that our neural networks achieved the best accuracy in music genre classification comparing with three state-of-art systems.

引用

页码：379 / 386

页数：8

共 43 条

[1] [Anonymous], INT C DIG AUD EFF DA
[2] [Anonymous], INT SOC MUSIC INFORM
[3] [Anonymous], 2009, NEURAL INFORM PROCES
[4] [Anonymous], INT SOC MUSIC INFORM
[5] [Anonymous], INT S MUS INF RETR
[6] [Anonymous], INT C MACH LEARN APP
[7] [Anonymous], INT C DIG AUD EFF DA
[8] [Anonymous], INT C MACH LEARN ICM
[9] [Anonymous], INT C MULT RETR ICMR
[10] [Anonymous], INT SOC MUSIC INFORM

← 1 2 3 4 5 →