Multi-step learning and underlying structure in statistical models

被引:0
作者
Fraser, Maia [1 ]
机构
[1] Univ Ottawa, Dept Math & Stat, Brain & Mind Res Inst, Ottawa, ON K1N 6N5, Canada
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016) | 2016年 / 29卷
关键词
MANIFOLD REGULARIZATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In multi-step learning, where a final learning task is accomplished via a sequence of intermediate learning tasks, the intuition is that successive steps or levels transform the initial data into representations more and more "suited" to the final learning task. A related principle arises in transfer-learning where Baxter (2000) proposed a theoretical framework to study how learning multiple tasks transforms the inductive bias of a learner. The most widespread multi-step learning approach is semi-supervised learning with two steps: unsupervised, then supervised. Several authors (Castelli-Cover, 1996; Balcan-Blum, 2005; Niyogi, 2008; Ben-David et al, 2008; Urner et al, 2011) have analyzed SSL, with Balcan-Blum (2005) proposing a version of the PAC learning framework augmented by a "compatibility function" to link concept class and unlabeled data distribution. We propose to analyze SSL and other multi-step learning approaches, much in the spirit of Baxter's framework, by defining a learning problem generatively as a joint statistical model on X.Y. This determines in a natural way the class of conditional distributions that are possible with each marginal, and amounts to an abstract form of compatibility function. It also allows to analyze both discrete and non-discrete settings. As tool for our analysis, we define a notion of gamma-uniform shattering for statistical models. We use this to give conditions on the marginal and conditional models which imply an advantage for multi-step learning approaches. In particular, we recover a more general version of a result of Poggio et al (2012): under mild hypotheses a multi-step approach which learns features invariant under successive factors of a finite group of invariances has sample complexity requirements that are additive rather than multiplicative in the size of the subgroups.
引用
收藏
页数:9
相关论文
共 19 条
[1]   The reverse hierarchy theory of visual perceptual learning [J].
Ahissar, M ;
Hochstein, S .
TRENDS IN COGNITIVE SCIENCES, 2004, 8 (10) :457-464
[2]  
Alain G., 2012, TECHNICAL REPORT
[3]  
[Anonymous], 2008, COLT
[4]  
[Anonymous], 1996, PROBABILISTIC THEORY
[5]   A PAC-style model for learning from labeled and unlabeled data [J].
Balcan, MF ;
Blum, A .
LEARNING THEORY, PROCEEDINGS, 2005, 3559 :111-126
[6]   A model of inductive bias learning [J].
Baxter, J .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2000, 12 :149-198
[7]  
Belkin M, 2006, J MACH LEARN RES, V7, P2399
[8]   Hierarchical development of the primate visual cortex, as revealed by neurofilament immunoreactivity: Early maturation of the middle temporal area (MT) [J].
Bourne, JA ;
Rosa, MGP .
CEREBRAL CORTEX, 2006, 16 (03) :405-414
[9]   The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter [J].
Castelli, V ;
Cover, TM .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1996, 42 (06) :2102-2117
[10]  
Haussler D., 1989, GEN PAC MODEL SAMPLE, P40