Statistical learning theory of structured data

被引:14
|
作者
Pastore, Mauro [1 ,2 ]
Rotondo, Pietro [1 ,2 ]
Erba, Vittorio [1 ,2 ]
Gherardi, Marco [1 ,2 ]
机构
[1] Univ Milan, Dipartimento Fis, Via Celoria 16, I-20133 Milan, Italy
[2] Ist Nazl Fis Nucl, Via Celoria 16, I-20133 Milan, Italy
基金
欧盟地平线“2020”;
关键词
SATISFIABILITY;
D O I
10.1103/PhysRevE.102.032119
中图分类号
O35 [流体力学]; O53 [等离子体物理学];
学科分类号
070204 ; 080103 ; 080704 ;
摘要
The traditional approach of statistical physics to supervised learning routinely assumes unrealistic generative models for the data: Usually inputs are independent random variables, uncorrelated with their labels. Only recently, statistical physicists started to explore more complex forms of data, such as equally labeled points lying on (possibly low-dimensional) object manifolds. Here we provide a bridge between this recently established research area and the framework of statistical learning theory, a branch of mathematics devoted to inference in machine learning. The overarching motivation is the inadequacy of the classic rigorous results in explaining the remarkable generalization properties of deep learning. We propose a way to integrate physical models of data into statistical learning theory and address, with both combinatorial and statistical mechanics methods, the computation of the Vapnik-Chervonenkis entropy, which counts the number of different binary classifications compatible with the loss class. As a proof of concept, we focus on kernel machines and on two simple realizations of data structure introduced in recent physics literature: k-dimensional simplexes with prescribed geometric relations and spherical manifolds (equivalent to margin classification). Entropy, contrary to what happens for unstructured data, is nonmonotonic in the sample size, in contrast with the rigorous bounds. Moreover, data structure induces a transition beyond the storage capacity, which we advocate as a proxy of the nonmonotonicity, and ultimately a cue of low generalization error. The identification of a synaptic volume vanishing at the transition allows a quantification of the impact of data structure within replica theory, applicable in cases where combinatorial methods are not available, as we demonstrate for margin learning.
引用
收藏
页数:17
相关论文
共 50 条