Statistical learning theory of structured data

被引：14

作者：

Pastore, Mauro ^{[1
,2
]}

Rotondo, Pietro ^{[1
,2
]}

Erba, Vittorio ^{[1
,2
]}

Gherardi, Marco ^{[1
,2
]}

机构：

[1] Univ Milan, Dipartimento Fis, Via Celoria 16, I-20133 Milan, Italy

[2] Ist Nazl Fis Nucl, Via Celoria 16, I-20133 Milan, Italy

来源：

PHYSICAL REVIEW E | 2020年 / 102卷 / 03期

基金：

欧盟地平线“2020”;

关键词：

SATISFIABILITY;

D O I：

10.1103/PhysRevE.102.032119

中图分类号：

O35 [流体力学]; O53 [等离子体物理学];

学科分类号：

070204 ; 080103 ; 080704 ;

摘要：

The traditional approach of statistical physics to supervised learning routinely assumes unrealistic generative models for the data: Usually inputs are independent random variables, uncorrelated with their labels. Only recently, statistical physicists started to explore more complex forms of data, such as equally labeled points lying on (possibly low-dimensional) object manifolds. Here we provide a bridge between this recently established research area and the framework of statistical learning theory, a branch of mathematics devoted to inference in machine learning. The overarching motivation is the inadequacy of the classic rigorous results in explaining the remarkable generalization properties of deep learning. We propose a way to integrate physical models of data into statistical learning theory and address, with both combinatorial and statistical mechanics methods, the computation of the Vapnik-Chervonenkis entropy, which counts the number of different binary classifications compatible with the loss class. As a proof of concept, we focus on kernel machines and on two simple realizations of data structure introduced in recent physics literature: k-dimensional simplexes with prescribed geometric relations and spherical manifolds (equivalent to margin classification). Entropy, contrary to what happens for unstructured data, is nonmonotonic in the sample size, in contrast with the rigorous bounds. Moreover, data structure induces a transition beyond the storage capacity, which we advocate as a proxy of the nonmonotonicity, and ultimately a cue of low generalization error. The identification of a synaptic volume vanishing at the transition allows a quantification of the impact of data structure within replica theory, applicable in cases where combinatorial methods are not available, as we demonstrate for margin learning.

引用

页数：17

共 50 条

[1] Regularization and statistical learning theory for data analysis
Evgeniou, T
Poggio, T
Pontil, M
Verri, A
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2002, 38 (04) : 421 - 432
[2] Statistical Learning Theory and ELM for Big Social Data Analysis
Oneto, Luca
Bisio, Federica
Cambria, Erik
Anguita, Davide
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, 2016, 11 (03) : 46 - 56
[3] STATISTICAL ANALYSIS ON COMPLEXLY STRUCTURED DATA
Rahman, Prosha
BULLETIN OF THE AUSTRALIAN MATHEMATICAL SOCIETY, 2024,
[4] What is structured learning theory?
Cattell, RB
THEORY & PSYCHOLOGY, 1996, 6 (01) : 169 - 171
[5] What is structured learning theory?
Cattell, RB
BRITISH JOURNAL OF EDUCATIONAL PSYCHOLOGY, 1996, 66 : 411 - 413
[6] PROBABILITY LEARNING IN STATISTICAL LEARNING THEORY
FEICHTINGER, G
METRIKA, 1971, 18 (01) : 35 - 55
[7] Statistical learning theory for fitting multimodal distribution to rainfall data: an application
Ghosh, Himadri
Prajneshu
JOURNAL OF APPLIED STATISTICS, 2011, 38 (11) : 2533 - 2545
[8] Rethinking statistical learning theory: learning using statistical invariants
Vapnik, Vladimir
Izmailov, Rauf
MACHINE LEARNING, 2019, 108 (03) : 381 - 423
[9] Complete Statistical Theory of Learning (Learning Using Statistical Invariants)
Vapnik, Vladimir
Izmailov, Rauf
CONFORMAL AND PROBABILISTIC PREDICTION AND APPLICATIONS, VOL 128, 2020, 128 : 4 - 40
[10] Rethinking statistical learning theory: learning using statistical invariants
Vladimir Vapnik
Rauf Izmailov
Machine Learning, 2019, 108 : 381 - 423

← 1 2 3 4 5 →