A Neyman-Pearson approach to statistical learning

被引：89

作者：

Scott, C ^{[1
]}

Nowak, R

机构：

[1] Rice Univ, Dept Stat, Houston, TX 77005 USA

[2] Univ Wisconsin, Dept Elect & Comp Engn, Madison, WI 53706 USA

来源：

IEEE TRANSACTIONS ON INFORMATION THEORY | 2005年 / 51卷 / 11期

基金：

美国国家科学基金会;

关键词：

generalization error bounds; Neyman-Pearson (NP) classification; statistical learning theory;

D O I：

10.1109/TIT.2005.856955

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The Neyman-Pearson (NP) approach to hypothesis testing is useful in situations where different types of error have different consequences or a priori probabilities are unknown. For any alpha > 0, the NP lemma specifies the most powerful test of size alpha, but assumes the distributions for each. hypothesis are known or (in some cases) the likelihood ratio is monotonic in an unknown parameter. This paper investigates an extension of NP theory to situations in which one has no knowledge of the underlying distri butions; except for a collection of independent and identically distributed (i.i.d.) training examples from each hypothesis. Building on a "fundamental lemma" of Cannon et al., we demonstrate that several concepts from statistical learning theory have counterparts in the NP context. Specifically, we consider constrained versions of empirical risk minimization (NP-ERM) and structural risk minimization (NP-SRM), and prove performance guarantees for both. General conditions are given under which NP-SRM leads to strong universal consistency. We also apply NP-SRM to (dyadic) decision trees to derive rates of convergence. Finally, we present explicit algorithms to implement NP-SRM for histograms and dyadic decision trees.

引用

页码：3806 / 3819

页数：14

共 38 条

[1] Model selection and error estimation [J].