Semantic annotation and retrieval of music and sound effects

被引:275
作者
Turnbull, Douglas [1 ]
Barrington, Luke [2 ]
Torres, David [1 ]
Lanckriet, Gert [2 ]
机构
[1] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
[2] Univ Calif San Diego, Dept Elect & Comp Engn, La Jolla, CA 92093 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2008年 / 16卷 / 02期
基金
美国国家科学基金会;
关键词
audio annotation and retrieval; music information retrieval; semantic music analysis;
D O I
10.1109/TASL.2007.913750
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present a computer audition system that can both annotate novel audio tracks with semantically meaningful words and retrieve relevant tracks from a database of unlabeled audio content given a text-based query. We consider the related tasks of content-based audio annotation and retrieval as one supervised multiclass, multilabel problem in which we model the joint probability of acoustic features and words. We collect a data set of 1700 human-generated annotations that describe 500 Western popular music tracks. For each word in a vocabulary, we use this data to train a Gaussian mixture model (GMM) over an audio feature space. We estimate the parameters of the model using the weighted mixture hierarchies expectation maximization algorithm. This algorithm is more scalable to large data sets and produces better density estimates than standard parameter estimation techniques. The quality of the music annotations produced by our system is comparable with the performance of humans on the same task. Our "query-by-text" system can retrieve appropriate songs for a large number of musically relevant words. We also show that our audition system is general by learning a model that can annotate and retrieve sound effects.
引用
收藏
页码:467 / 476
页数:10
相关论文
共 34 条
[1]  
[Anonymous], 2005, P 6 INT C MUSIC INFO
[2]  
BARRINGTON L, 2007, P IEEE ICASSP
[3]   A large-scale evaluation of acoustic and subjective music-similarity measures [J].
Berenzweig, A ;
Logan, B ;
Ellis, DPW ;
Whitman, B .
COMPUTER MUSIC JOURNAL, 2004, 28 (02) :63-76
[4]  
Blei D., 2003, P 26 ANN INT ACM SIG, P127, DOI DOI 10.1145/860435.860460
[5]  
BUCHANAN CR, 2005, THESIS U EDINBURGH E
[6]  
Cano P, 2004, MACHINE LEARN SIGN P, P391
[7]  
Carneiro G, 2005, PROC CVPR IEEE, P163
[8]  
DANNENBERG RB, 2004, P ISMIR, P232
[9]  
Downie J.S, MUSIC INFORM RETRIEV
[10]  
EISENBERG G, 2004, AUD ENG SOC CONV