Kernels as features: On kernels, margins, and low-dimensional mappings

被引:67
作者
Balcan, Maria-Florina
Blum, Avrim [1 ]
Vempala, Santosh
机构
[1] Carnegie Mellon Univ, Dept Comp Sci, Pittsburgh, PA 15213 USA
[2] MIT, Dept Math, Cambridge, MA 02139 USA
基金
美国国家科学基金会;
关键词
Computer programming - Computer science - Information retrieval - Mapping - Problem solving - Software engineering;
D O I
10.1007/s10994-006-7550-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Kernel functions are typically viewed as providing an implicit mapping of points into a high-dimensional space, with the ability to gain much of the power of that space without incurring a high cost if the result is linearly-separable by a large margin gamma. However, the Johnson-Lindenstrauss lemma suggests that in the presence of a large margin, a kernel function can also be viewed as a mapping to a low-dimensional space, one of dimension only (O) over tilde (1/gamma(2)). In this paper, we explore the question of whether one can efficiently produce such low-dimensional mappings, using only black-box access to a kernel function. That is, given just a program that computes K(x,y) on inputs x,y of our choosing, can we efficiently construct an explicit (small) set of features that effectively capture the power of the implicit high-dimensional space? We answer this question in the affirmative if our method is also allowed black-box access to the underlying data distribution (i.e., unlabeled examples). We also give a lower bound, showing that if we do not have access to the distribution, then this is not possible for an arbitrary black-box kernel function; we leave as an open problem, however, whether this can be done for standard kernel functions such as the polynomial kernel. Our positive result can be viewed as saying that designing a good kernel function is much like designing a good feature space. Given a kernel, by running it in a black-box manner on random unlabeled examples, we can efficiently generate an explicit set of (O) over tilde (1/gamma(2)) features, such that if the data was linearly separable with margin gamma under the kernel, then it is approximately separable in this new feature space.
引用
收藏
页码:79 / 94
页数:16
相关论文
共 26 条
[1]   Database-friendly random projections: Johnson-Lindenstrauss with binary coins [J].
Achlioptas, D .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2003, 66 (04) :671-687
[2]  
[Anonymous], 2004, KERNEL METHODS PATTE
[3]  
Arriaga R. I., 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039), P616, DOI 10.1109/SFFCS.1999.814637
[4]  
Bartlett P, 1999, ADVANCES IN KERNEL METHODS, P43
[5]  
Ben-David S., 2003, Journal of Machine Learning Research, V3, P441, DOI 10.1162/153244303321897681
[6]  
BENDAVID S, 2001, NIPS WORKSH KERN BAS
[7]  
BENISRAEL A, 1974, GEN INVERSES THEORY
[8]  
Blake C.L., 1998, UCI repository of machine learning databases
[9]  
Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401
[10]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411