Analysis of minimum distances in high-dimensional musical spaces

被引：42

作者：

Casey, Michael ^{[1
,2
]}

Rhodes, Christophe ^{[1
]}

Slaney, Malcolm ^{[3
]}

机构：

[1] Univ London Goldsmiths Coll, Dept Comp, London SE14 6NW, England

[2] Dartmouth Coll, Dept Mus, Hanover, NH 03755 USA

[3] Yahoo Res Inc, Santa Clara, CA 95054 USA

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2008年 / 16卷 / 05期

基金：

英国工程与自然科学研究理事会;

关键词：

audio shingles; distance distributions; locality-sensitive hashing; matched-filter distance; music similarity;

D O I：

10.1109/TASL.2008.925883

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We propose an automatic method for measuring content-based music similarity, enhancing the current generation of music search engines and recommender systems. Many previous approaches to track similarity require brute-force, pair-wise processing between all audio features in a database and therefore are not practical for large collections. However, in an Internet-connected world, where users have access to millions of musical tracks, efficiency is crucial. Our approach uses features extracted from unlabeled audio data and near-neigbor retrieval using a distance threshold, determined by analysis, to solve a range of retrieval tasks. The tasks require temporal features-analogous to the technique of shingling used for text retrieval. To measure similarity, we count pairs of audio shingles, between a query and target track, that are below a distance threshold. The distribution of between-shingle distances is different for each database; therefore, we present an analysis of the distribution of minimum distances between shingles and a method for estimating a distance threshold for optimal retrieval performance. The method is compatible with locality-sensitive hashing (LSH)-allowing implementation with retrieval times several orders of magnitude faster than those using exhaustive distance computations. We evaluate the performance of our proposed method on three contrasting music similarity tasks: retrieval of mis-attributed recordings (fingerprint), retrieval of the same work performed by different artists (cover songs), and retrieval of edited and sampled versions of a query track by remix artists (remixes). Our method achieves near-perfect performance in the first two tasks and 75% precision at 70% recall in the third task. Each task was performed on a test database comprising 4.5 million audio shingles.

引用

页码：1015 / 1028

页数：14

共 30 条

[1] Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions [J].

Andoni, Alexandr ;

Indyk, Piotr .

COMMUNICATIONS OF THE ACM, 2008, 51 (01) :117-122

[2]

[Anonymous], P 7 INT C MUS INF RE

[3]

[Anonymous], 2002, Proc. 3rd International Conference on Music Information Retrieval (ISMIR 2002)

[4]

[Anonymous], 1979, Theoretical statistics

[5]

BALUJA S, 2007, P IEEE INT C AC SPEE, V2, P213

[6]

Baluja S, 2007, 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P2663

[7] To catch a chorus: Using chroma-based representations for audio thumbnailing [J].

Bartsch, MA ;

Wakefield, GH .

PROCEEDINGS OF THE 2001 IEEE WORKSHOP ON THE APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, 2001, :15-18

[8]

Beyer K, 1999, LECT NOTES COMPUT SC, V1540, P217

[9] A cost model for query processing in high dimensional data spaces [J].

Böhm, C .

ACM TRANSACTIONS ON DATABASE SYSTEMS, 2000, 25 (02) :129-178

[10]

Broder A. Z., 1997, P 6 INT WORLD WID WE, V29, P1157, DOI [DOI 10.1016/S0169-7552(97)00031-7, 10.1016/S0169-7552(97)00031-7]

← 1 2 3 →