Music Emotion Recognition by Using Chroma Spectrogram and Deep Visual Features

被引：34

作者：

Er, Mehmet Bilal ^{[1
]}

Aydilek, Ibrahim Berkan ^{[1
]}

机构：

[1] Harran Univ, Dept Comp Engn, Fac Engn, TR-63050 Sanliurfa, Turkey

来源：

INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS | 2019年 / 12卷 / 02期

关键词：

Music emotion recognition; Deep learning; Deep features; Chroma spectrogram; AlexNet; VGG-16;

D O I：

10.2991/ijcis.d.191216.001

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Music has a great role and importance in human life since it has the ability to trigger or convey feelings. As recognizing music emotions is the subject of many studies conducted in many disciplines like science, psychology, musicology and art, it has attracted the attention of researchers as an up-to-date research topic in recent years. Many researchers extract acoustic features from music and investigate relations between emotional tags corresponding to these features. In recent studies, on the other hand, music types are classified emotionally by using deep learning through music spectrograms that involved both time and frequency domain information. In the present study, a new method is presented for music emotion recognition by employing pre-trained deep learning model with chroma spectrograms extracted from music recordings. The AlexNet architecture is used as the pre-trained network model. The conv5, Fc6, Fc7 and Fc8 layers of the AlexNet model are chosen as the feature extracting layer, and deep visual features are extracted from these layers. The extracted deep features are used to train and test the Support Vector Machines (SVM) and the Softmax classifiers. Besides, deep visual features are extracted from conv5_3, Fc6, Fc7 and Fc8 layers of the VGG-16 deep network model and the same experimental applications are made in order to find out the effective power of pre-trained deep networks in music emotion recognition. Several experiments are conducted on two datasets, and better results are obtained with the proposed method. The best result is obtained from the VGG-16 in the Fc7 layer as 89.2% on our dataset. According to the obtained results, it is observed that the presented method performs better. (C) 2019 The Authors. Published by Atlantis Press SARL.

引用

页码：1622 / 1634

页数：13

共 38 条

[1]

[Anonymous], 2006, P 14 ACM INT C MULT

[2]

[Anonymous], 2011, 22 INT JT C ART INT, DOI 10.5555/2283516.2283603

[3]

[Anonymous], 2012, ACM T INTEL SYST TEC, DOI DOI 10.1145/2168752.2168754

[4]

Barthet Mathieu, 2012, P CMMR, P492

[5]

Delbouys R., 2018, P ISMIR PAR

[6] A comparison of the discrete and dimensional models of emotion in music [J].

Eerola, Tuomas ;

Vuoskoski, Jonna K. .

PSYCHOLOGY OF MUSIC, 2011, 39 (01) :18-49

[7]

FENG Y, 2003, P 26 ANN INT ACM SIG

[8] Flotation froth image recognition with convolutional neural networks [J].

Fu, Y. ;

Aldrich, C. .

MINERALS ENGINEERING, 2019, 132 :183-190

[9]

HAJEK J., 1967, THEORY RANK TESTS

[10]

Han B.-j., 2009, ISMIR, P651

← 1 2 3 4 →