DENSITY-SENSITIVE SEMISUPERVISED INFERENCE

被引:14
作者
Azizyan, Martin [1 ,2 ]
Singh, Aarti [1 ,2 ]
Wasserman, Larry [1 ,2 ]
机构
[1] Carnegie Mellon Univ, Dept Stat, Pittsburgh, PA 15213 USA
[2] Carnegie Mellon Univ, Machine Learning Dept, Pittsburgh, PA 15213 USA
关键词
Nonparametric inference; semisupervised; kernel density; efficiency;
D O I
10.1214/13-AOS1092
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Semisupervised methods are techniques for using labeled data (X-1, Y-1), ..., (X-n, Y-n) together with unlabeled data Xn+1, ..., X-N to make predictions. These methods invoke some assumptions that link the marginal distribution P-X of X to the regression function f(x). For example, it is common to assume that f is very smooth over high density regions of P-X. Many of the methods are ad-hoc and have been shown to work in specific examples but are lacking a theoretical foundation. We provide a minimax framework for analyzing semisupervised methods. In particular, we study methods based on metrics that are sensitive to the distribution P-X. Our model includes a parameter alpha that controls the strength of the semisupervised assumption. We then use the data to adapt to alpha.
引用
收藏
页码:751 / 771
页数:21
相关论文
共 24 条