Blessing of dimensionality: mathematical foundations of the statistical physics of data

被引:109
作者
Gorban, A. N. [1 ]
Tyukin, I. Y. [1 ,2 ]
机构
[1] Univ Leicester, Dept Math, Leicester LE1 7RH, Leics, England
[2] St Petersburg State Electrotech Univ, Dept Automat & Control Proc, St Petersburg 197376, Russia
来源
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES | 2018年 / 376卷 / 2118期
基金
“创新英国”项目;
关键词
measure concentration; extreme points; ensemble equivalence; Fisher's discriminant; linear separability; NEURONS; INEQUALITIES; NETWORKS; MACHINE;
D O I
10.1098/rsta.2017.0237
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The concentrations of measure phenomena were discovered as the mathematical background to statistical mechanics at the end of the nineteenth/ beginning of the twentieth century and have been explored in mathematics ever since. At the beginning of the twenty-first century, it became clear that the proper utilization of these phenomena in machine learning might transform the curse of dimensionality into the blessing of dimensionality. This paper summarizes recently discovered phenomena of measure concentration which drastically simplify some machine learning problems in high dimension, and allow us to correct legacy artificial intelligence systems. The classical concentration of measure theorems state that i.i.d. random points are concentrated in a thin layer near a surface (a sphere or equators of a sphere, an average or median-level set of energy or another Lipschitz function, etc.). The new stochastic separation theorems describe the thin structure of these thin layers: the random points are not only concentrated in a thin layer but are all linearly separable from the rest of the set, even for exponentially large random sets. The linear functionals for separation of points can be selected in the form of the linear Fisher's discriminant. All artificial intelligence systems make errors. Non-destructive correction requires separation of the situations (samples) with errors from the samples corresponding to correct behaviour by a simple and robust classifier. The stochastic separation theorems provide us with such classifiers and determine a non-iterative (one-shot) procedure for their construction. This article is part of the theme issue 'Hilbert's sixth problem'.
引用
收藏
页数:18
相关论文
共 71 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
Aggarwal Charu C., 2015, Data Classification, P285, DOI [10.1007/978-3-319-14142-8_6, DOI 10.1007/978-3-319-14142-8]
[3]  
Anderson J., 2014, PMLR, P1135
[4]  
[Anonymous], FDN GEOMETRY
[5]  
[Anonymous], 2015, ABS151201274 CORR
[6]  
[Anonymous], HIGH DIMENSIONAL BRA
[7]  
[Anonymous], 1961, PRINCIPLES NEURODYNA
[8]  
[Anonymous], 2004, Advances in neural information processing systems, DOI DOI 10.5555/2976040.2976138
[9]  
[Anonymous], ELEMENTS STAT LEARNI
[10]  
[Anonymous], MEMOIRS AM MATH SOC