Time Series Models for Semantic Music Annotation

被引：30

作者：

Coviello, Emanuele ^{[1
]}

Chan, Antoni B. ^{[2
]}

Lanckriet, Gert ^{[1
]}

机构：

[1] Univ Calif San Diego, Dept Elect & Comp Engn, La Jolla, CA 92093 USA

[2] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2011年 / 19卷 / 05期

基金：

美国国家科学基金会;

关键词：

Audio annotation and retrieval; dynamic texture model; music information retrieval; CLASSIFICATION;

D O I：

10.1109/TASL.2010.2090148

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Many state-of-the-art systems for automatic music tagging model music based on bag-of-features representations which give little or no account of temporal dynamics, a key characteristic of the audio signal. We describe a novel approach to automatic music annotation and retrieval that captures temporal (e. g., rhythmical) aspects as well as timbral content. The proposed approach leverages a recently proposed song model that is based on a generative time series model of the musical content-the dynamic texture mixture (DTM) model-that treats fragments of audio as the output of a linear dynamical system. To model characteristic temporal dynamics and timbral content at the tag level, a novel, efficient, and hierarchical expectation-maximization (EM) algorithm for DTM (HEM-DTM) is used to summarize the common information shared by DTMs modeling individual songs associated with a tag. Experiments show learning the semantics of music benefits from modeling temporal dynamics.

引用

页码：1343 / 1359

页数：17

共 42 条

[1]

Aucouturier Jean-Julien., 2002, Proceedings of the 3rd International Conference on Music Information Retrieval, ISMIR, P157

[2] Modeling Music as a Dynamic Texture [J].

Barrington, Luke ;

Chan, Antoni B. ;

Lanckriet, Gert .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (03) :602-612

[3]

Barrington Luke, 2008, P ISMIR

[4] A large-scale evaluation of acoustic and subjective music-similarity measures [J].

Berenzweig, A ;

Logan, B ;

Ellis, DPW ;

Whitman, B .

COMPUTER MUSIC JOURNAL, 2004, 28 (02) :63-76

[5]

Cano P, 2004, MACHINE LEARN SIGN P, P391

[6] Supervised learning of semantic classes for image annotation and retrieval [J].

Carneiro, Gustavo ;

Chan, Antoni B. ;

Moreno, Pedro J. ;

Vasconcelos, Nuno .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2007, 29 (03) :394-410

[7] Analysis of minimum distances in high-dimensional musical spaces [J].

Casey, Michael ;

Rhodes, Christophe ;

Slaney, Malcolm .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (05) :1015-1028

[8]

Chan AB, 2005, PROC CVPR IEEE, P846

[9] Modeling, clustering, and segmenting video with mixtures of dynamic textures [J].

Chan, Antoni B. ;

Vasconcelos, Nuno .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2008, 30 (05) :909-926

[10] Clustering Dynamic Textures with the Hierarchical EM Algorithm [J].

Chan, Antoni B. ;

Coviello, Emanuele ;

Lanckriet, Gert. R. G. .

2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, :2022-2029

← 1 2 3 4 5 →