Time Series Models for Semantic Music Annotation

被引:30
作者
Coviello, Emanuele [1 ]
Chan, Antoni B. [2 ]
Lanckriet, Gert [1 ]
机构
[1] Univ Calif San Diego, Dept Elect & Comp Engn, La Jolla, CA 92093 USA
[2] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2011年 / 19卷 / 05期
基金
美国国家科学基金会;
关键词
Audio annotation and retrieval; dynamic texture model; music information retrieval; CLASSIFICATION;
D O I
10.1109/TASL.2010.2090148
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Many state-of-the-art systems for automatic music tagging model music based on bag-of-features representations which give little or no account of temporal dynamics, a key characteristic of the audio signal. We describe a novel approach to automatic music annotation and retrieval that captures temporal (e. g., rhythmical) aspects as well as timbral content. The proposed approach leverages a recently proposed song model that is based on a generative time series model of the musical content-the dynamic texture mixture (DTM) model-that treats fragments of audio as the output of a linear dynamical system. To model characteristic temporal dynamics and timbral content at the tag level, a novel, efficient, and hierarchical expectation-maximization (EM) algorithm for DTM (HEM-DTM) is used to summarize the common information shared by DTMs modeling individual songs associated with a tag. Experiments show learning the semantics of music benefits from modeling temporal dynamics.
引用
收藏
页码:1343 / 1359
页数:17
相关论文
共 42 条
[1]  
Aucouturier Jean-Julien., 2002, Proceedings of the 3rd International Conference on Music Information Retrieval, ISMIR, P157
[2]   Modeling Music as a Dynamic Texture [J].
Barrington, Luke ;
Chan, Antoni B. ;
Lanckriet, Gert .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (03) :602-612
[3]  
Barrington Luke, 2008, P ISMIR
[4]   A large-scale evaluation of acoustic and subjective music-similarity measures [J].
Berenzweig, A ;
Logan, B ;
Ellis, DPW ;
Whitman, B .
COMPUTER MUSIC JOURNAL, 2004, 28 (02) :63-76
[5]  
Cano P, 2004, MACHINE LEARN SIGN P, P391
[6]   Supervised learning of semantic classes for image annotation and retrieval [J].
Carneiro, Gustavo ;
Chan, Antoni B. ;
Moreno, Pedro J. ;
Vasconcelos, Nuno .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2007, 29 (03) :394-410
[7]   Analysis of minimum distances in high-dimensional musical spaces [J].
Casey, Michael ;
Rhodes, Christophe ;
Slaney, Malcolm .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (05) :1015-1028
[8]  
Chan AB, 2005, PROC CVPR IEEE, P846
[9]   Modeling, clustering, and segmenting video with mixtures of dynamic textures [J].
Chan, Antoni B. ;
Vasconcelos, Nuno .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2008, 30 (05) :909-926
[10]   Clustering Dynamic Textures with the Hierarchical EM Algorithm [J].
Chan, Antoni B. ;
Coviello, Emanuele ;
Lanckriet, Gert. R. G. .
2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, :2022-2029