Graph-Based Multimodal Music Mood Classification in Discriminative Latent Space

被引：1

作者：

Su, Feng ^{[1
]}

Xue, Hao ^{[1
]}

机构：

[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210023, Jiangsu, Peoples R China

来源：

MULTIMEDIA MODELING (MMM 2017), PT I | 2017年 / 10132卷

基金：

美国国家科学基金会;

关键词：

Music mood classification; Multimodal; Graph learning; Locality Preserving Projection; Bag of sentences; EMOTION CLASSIFICATION;

D O I：

10.1007/978-3-319-51811-4_13

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Automatic music mood classification is an important and challenging problem in the field of music information retrieval (MIR) and has attracted growing attention from variant research areas. In this paper, we proposed a novel multimodal method for music mood classification that exploits the complementarity of the lyrics and audio information of music to enhance the classification accuracy. We first extract descriptive sentence-level lyrics and audio features from the music. Then, we project the paired low-level features of two different modalities into a learned common discriminative latent space, which not only eliminates between modality heterogeneity, but also increases the discriminability of the resulting descriptions. On the basis of the latent representation of music, we employ a graph learning based multi-modal classification model for music mood, which takes the cross-modality similarity between local audio and lyrics descriptions of music into account for effective exploitation of correlations between different modalities. The acquired predictions of mood category for every sentence of music are then aggregated by a simple voting scheme. The effectiveness of the proposed method has been demonstrated in the experiments on a real dataset composed of more than 3,000 min of music and corresponding lyrics.

引用

页码：152 / 163

页数：12

共 14 条

[1] Music Mood Classification Based on Lifelog
Tong, Haoyue
Zhang, Min
Soleimaninejadian, Pouneh
Zhang, Qianfan
Wu, Kailu
Liu, Yiqun
Ma, Shaoping
INFORMATION RETRIEVAL, CCIR 2018, 2018, 11168 : 55 - 66
[2] Sparse graph-based transduction for image classification
Huang, Sheng
Yang, Dan
Zhou, Jia
Huangfu, Lunwen
Zhang, Xiaohong
JOURNAL OF ELECTRONIC IMAGING, 2015, 24 (02)
[3] Graph-based adaptive and discriminative subspace learning for face image clustering
Liao, Mengmeng
Li, Yunjie
Gao, Meiguo
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 192
[4] Automatic Music Mood Classification Based on Timbre and Modulation Features
Ren, Jia-Min
Wu, Ming-Ju
Jang, Jyh-Shing Roger
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2015, 6 (03) : 236 - 246
[5] Sparse graph-based inductive learning with its application to image classification
Huang, Qianying
Zhang, Xiaohong
Huang, Sheng
Yang, Dan
JOURNAL OF ELECTRONIC IMAGING, 2016, 25 (05)
[6] Orthogonal latent space learning with feature weighting and graph learning for multimodal Alzheimer's disease diagnosis
Chen, Zhi
Liu, Yongguo
Zhang, Yun
Li, Qiaoqin
MEDICAL IMAGE ANALYSIS, 2023, 84
[7] Multimodal Semantics-Based Supervised Latent Dirichlet Allocation for Event Classification
Miao, Naiyang
Xue, Feng
Hong, Richang
IEEE MULTIMEDIA, 2021, 28 (04) : 8 - 17
[8] Deep multimodal graph-based network for survival prediction from highly multiplexed images and patient variables
Fu, Xiaohang
Patrick, Ellis
Yang, Jean Y. H.
Feng, David Dagan
Kim, Jinman
COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 154
[9] AI-Empowered Multimodal Hierarchical Graph-Based Learning for Situation Awareness on Enhancing Disaster Responses
Chen, Jieli
Seng, Kah Phooi
Ang, Li Minn
Smith, Jeremy
Xu, Hanyue
FUTURE INTERNET, 2024, 16 (05)
[10] A Novel Method Based on OMPGW Method for Feature Extraction in Automatic Music Mood Classification
Mo, Shasha
Niu, Jianwei
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2019, 10 (03) : 313 - 324

← 1 2 →