机构:
Univ Calif San Diego, Dept Elect & Comp Engn, La Jolla, CA 92093 USAUniv Calif San Diego, Dept Elect & Comp Engn, La Jolla, CA 92093 USA
Coviello, Emanuele
[1
]
Chan, Antoni B.
论文数: 0引用数: 0
h-index: 0
机构:
City Univ Hong Kong, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R ChinaUniv Calif San Diego, Dept Elect & Comp Engn, La Jolla, CA 92093 USA
Chan, Antoni B.
[2
]
Lanckriet, Gert
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif San Diego, Dept Elect & Comp Engn, La Jolla, CA 92093 USAUniv Calif San Diego, Dept Elect & Comp Engn, La Jolla, CA 92093 USA
Lanckriet, Gert
[1
]
机构:
[1] Univ Calif San Diego, Dept Elect & Comp Engn, La Jolla, CA 92093 USA
[2] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China
来源:
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
|
2011年
/
19卷
/
05期
基金:
美国国家科学基金会;
关键词:
Audio annotation and retrieval;
dynamic texture model;
music information retrieval;
CLASSIFICATION;
D O I:
10.1109/TASL.2010.2090148
中图分类号:
O42 [声学];
学科分类号:
070206 ;
082403 ;
摘要:
Many state-of-the-art systems for automatic music tagging model music based on bag-of-features representations which give little or no account of temporal dynamics, a key characteristic of the audio signal. We describe a novel approach to automatic music annotation and retrieval that captures temporal (e. g., rhythmical) aspects as well as timbral content. The proposed approach leverages a recently proposed song model that is based on a generative time series model of the musical content-the dynamic texture mixture (DTM) model-that treats fragments of audio as the output of a linear dynamical system. To model characteristic temporal dynamics and timbral content at the tag level, a novel, efficient, and hierarchical expectation-maximization (EM) algorithm for DTM (HEM-DTM) is used to summarize the common information shared by DTMs modeling individual songs associated with a tag. Experiments show learning the semantics of music benefits from modeling temporal dynamics.