ON THE IMPORTANCE OF MODELING TEMPORAL INFORMATION IN MUSIC TAG ANNOTATION

被引:7
作者
Reed, Jeremy [1 ]
Lee, Chin-Hui [1 ]
机构
[1] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
来源
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS | 2009年
关键词
Music; Hidden Markov models; Information retrieval; Vector quantization; Speech processing;
D O I
10.1109/ICASSP.2009.4959973
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Music is an art form in which sounds are organized in time; however, current approaches for determining similarity and classification largely ignore temporal information. This paper presents an approach to automatic tagging which incorporates temporal aspects of music directly into the statistical models, unlike the typical bag-of-frames paradigm in traditional music information retrieval techniques. Vector quantization on song segments leads to a vocabulary of acoustic segment models. An unsupervised, iterative process that cycles between Viterbi decoding and Baum-Welch estimation builds transcripts of this vocabulary. Latent semantic analysis converts the song transcriptions into a vector for subsequent classification using a support vector machine for each tag. Experimental results demonstrate that the proposed approach performs better in 15 of the 18 tags. Further analysis demonstrates an ability to capture local timbral characteristics as well as sequential arrangements of acoustic segment models.
引用
收藏
页码:1873 / 1876
页数:4
相关论文
共 13 条
[1]   Representing musical genre: A state of the art [J].
Aucouturier, JJ ;
Pachet, F .
JOURNAL OF NEW MUSIC RESEARCH, 2003, 32 (01) :83-93
[2]   To catch a chorus: Using chroma-based representations for audio thumbnailing [J].
Bartsch, MA ;
Wakefield, GH .
PROCEEDINGS OF THE 2001 IEEE WORKSHOP ON THE APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, 2001, :15-18
[3]   Exploiting latent semantic information in statistical language modeling [J].
Bellegarda, JR .
PROCEEDINGS OF THE IEEE, 2000, 88 (08) :1279-1296
[4]  
CASEY M, 2006, P ICASSP, pV5
[5]   Approximate statistical tests for comparing supervised classification learning algorithms [J].
Dietterich, TG .
NEURAL COMPUTATION, 1998, 10 (07) :1895-1923
[6]  
JOACHIMS T, 1999, ADV KERNEL METHODS S, P10
[7]  
Meng A., 2005, Proceedings of the International Conference on Music Information Retrieval, P604
[8]  
Reed Jeremy., 2006, Proceedings of the 7th International Symposium on Music Information Retrieval, P89
[9]  
Slaney M, 2002, INT CONF ACOUST SPEE, P4108
[10]  
Svendsen T., 1987, Proceedings: ICASSP 87. 1987 International Conference on Acoustics, Speech, and Signal Processing (Cat. No.87CH2396-0), P77