Kernels as features: On kernels, margins, and low-dimensional mappings

被引：67

作者：

Balcan, Maria-Florina

Blum, Avrim ^{[1
]}

Vempala, Santosh

机构：

[1] Carnegie Mellon Univ, Dept Comp Sci, Pittsburgh, PA 15213 USA

[2] MIT, Dept Math, Cambridge, MA 02139 USA

来源：

MACHINE LEARNING | 2006年 / 65卷 / 01期

基金：

美国国家科学基金会;

关键词：

Computer programming - Computer science - Information retrieval - Mapping - Problem solving - Software engineering;

D O I：

10.1007/s10994-006-7550-1

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Kernel functions are typically viewed as providing an implicit mapping of points into a high-dimensional space, with the ability to gain much of the power of that space without incurring a high cost if the result is linearly-separable by a large margin gamma. However, the Johnson-Lindenstrauss lemma suggests that in the presence of a large margin, a kernel function can also be viewed as a mapping to a low-dimensional space, one of dimension only (O) over tilde (1/gamma(2)). In this paper, we explore the question of whether one can efficiently produce such low-dimensional mappings, using only black-box access to a kernel function. That is, given just a program that computes K(x,y) on inputs x,y of our choosing, can we efficiently construct an explicit (small) set of features that effectively capture the power of the implicit high-dimensional space? We answer this question in the affirmative if our method is also allowed black-box access to the underlying data distribution (i.e., unlabeled examples). We also give a lower bound, showing that if we do not have access to the distribution, then this is not possible for an arbitrary black-box kernel function; we leave as an open problem, however, whether this can be done for standard kernel functions such as the polynomial kernel. Our positive result can be viewed as saying that designing a good kernel function is much like designing a good feature space. Given a kernel, by running it in a black-box manner on random unlabeled examples, we can efficiently generate an explicit set of (O) over tilde (1/gamma(2)) features, such that if the data was linearly separable with margin gamma under the kernel, then it is approximately separable in this new feature space.

引用

页码：79 / 94

页数：16

共 26 条

[1] Database-friendly random projections: Johnson-Lindenstrauss with binary coins [J].

Achlioptas, D .

JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2003, 66 (04) :671-687

[2]

[Anonymous], 2004, KERNEL METHODS PATTE

[3]

Arriaga R. I., 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039), P616, DOI 10.1109/SFFCS.1999.814637

[4]

Bartlett P, 1999, ADVANCES IN KERNEL METHODS, P43

[5]

Ben-David S., 2003, Journal of Machine Learning Research, V3, P441, DOI 10.1162/153244303321897681

[6]

BENDAVID S, 2001, NIPS WORKSH KERN BAS

[7]

BENISRAEL A, 1974, GEN INVERSES THEORY

[8]

Blake C.L., 1998, UCI repository of machine learning databases

[9]

Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401

[10]

CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411

← 1 2 3 →