Dynamic Time Warping for Music Retrieval Using Time Series Modeling of Musical Emotions

被引:29
作者
Deng, James J. [1 ]
Leung, Clement H. C. [1 ]
机构
[1] Hong Kong Baptist Univ, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China
关键词
Musical emotion; multiple dynamic textures; EM algorithm; Kalman filter and smoother; dynamic time warping; RESPONSES; RECOGNITION; PREDICTION; EXPRESSION; MIXTURES;
D O I
10.1109/TAFFC.2015.2404352
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Musical signals have rich temporal information not only at the physical level but at the emotion level. The listeners may wish to find music excerpts that have similar sequence patterns of musical emotions with given excerpts. Most state-of-the-art systems for emotion-based music retrieval concentrate on static analysis of musical emotions, and ignore dynamic analysis and modeling of musical emotions over time. This paper presents a novel approach to perform music retrieval based on time-varying musical emotion dynamics. A three-dimensional musical emotion model-Resonance-Arousal-Valence (RAV)-is used, and emotions of a piece of music are represented by musical emotion dynamics in a time series. A multiple dynamic textures (MDT) model is proposed to model music and emotion dynamics over time, and expectation maximization (EM) algorithm along with Kalman filtering and smoothing is used to estimate model parameters. Two smoothing methods-Rauch-Tung-Striebel (RTS) and minimum-variance smoothing (MVS)-to robust model are investigated and compared to find an optimal solution to enhance prediction. To find similar sequence patterns of musical emotions, subsequence dynamic time warping (DTW) for emotion dynamics matching is presented. Experimental results demonstrate the benefits of MDT to predict time-varying musical emotions, and our proposed method for music retrieval based on emotion dynamics outperforms retrieval methods based on acoustic features.
引用
收藏
页码:137 / 151
页数:15
相关论文
共 58 条
[1]   Modeling Music as a Dynamic Texture [J].
Barrington, Luke ;
Chan, Antoni B. ;
Lanckriet, Gert .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (03) :602-612
[2]  
Berndt D., 1994, AAAI Technical Report WS-94-03 Association for the Advancement of Artificial Intelligence
[3]   Multidimensional scaling of emotional responses to music: The effect of musical expertise and of the duration of the excerpts [J].
Bigand, E ;
Vieillard, S ;
Madurell, F ;
Marozeau, J ;
Dacquet, A .
COGNITION & EMOTION, 2005, 19 (08) :1113-1139
[4]   Modeling, clustering, and segmenting video with mixtures of dynamic textures [J].
Chan, Antoni B. ;
Vasconcelos, Nuno .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2008, 30 (05) :909-926
[5]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[6]  
Collier G.L., 2007, Psychology of Music, V35, P110, DOI [DOI 10.1177/0305735607068890, 10.1177/0305735607068890]
[7]   Musical Emotions: Predicting Second-by-Second Subjective Feelings of Emotion From Low-Level Psychoacoustic Features and Physiological Measurements [J].
Coutinho, Eduardo ;
Cangelosi, Angelo .
EMOTION, 2011, 11 (04) :921-937
[8]   Time Series Analysis as a Method to Examine Acoustical Influences on Real-time Perception of Music [J].
Dean, Roger T. ;
Bailes, Freya .
EMPIRICAL MUSICOLOGY REVIEW, 2010, 5 (04) :152-175
[9]  
Deng J. J., 2013, LECT NOTES COMPUTER, P524
[10]   Dynamic textures [J].
Doretto, G ;
Chiuso, A ;
Wu, YN ;
Soatto, S .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2003, 51 (02) :91-109