A polynomial-time algorithm for learning noisy linear threshold functions

被引:74
作者
Blum, A [1 ]
Frieze, A
Kannan, R
Vempala, S
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
[2] Carnegie Mellon Univ, Dept Math Sci, Pittsburgh, PA 15213 USA
关键词
computational learning theory; linear threshold functions; perceptron algorithm; learning with noise;
D O I
10.1007/PL00013833
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In this paper we consider the problem of learning a linear threshold function (a half-space in n dimensions, also called a "perceptron"). Methods for solving this problem generally fall into two categories. In the absence of noise, this problem can be formulated as a Linear Program and solved in polynomial time with the Ellipsoid Algorithm or Interior Point methods. Alternatively, simple greedy algorithms such as the Perceptron Algorithm are often used in practice and have certain provable noise-tolerance properties; but their running time depends on a separation parameter, which quantifies the amount of "wiggle room" available for a solution, and can be exponential in the description length of the input. In this paper we show how simple greedy methods can be used to find weak hypotheses (hypotheses that correctly classify noticeably more than half of the examples) in polynomial time, without dependence on any separation parameter. Suitably combining these hypotheses results in a polynomial-time algorithm for learning linear threshold functions in the PAC model in the presence of random classification noise. (Also, a polynomial-time algorithm for learning linear threshold functions in the Statistical Query model of Kearns.) Our algorithm is based on a new method for removing outliers in data. Specifically, for any set S of points in R-n, each given to b bits of precision, we show that one can remove only a small fraction of S so that in the remaining set T, for every vector v max(x epsilon T) (v . x)(2) less than or equal to poly(n, b)E-x epsilon T (v . x)(2); i.e., for any hyperplane through the origin, the maximum distance (squared) from a point in T to the plane is at most polynomially larger than the average. After removing these outliers, we are able to show that a modified version of the Perceptron Algorithm finds a weak hypothesis in polynomial time, even in the presence of random classification noise.
引用
收藏
页码:35 / 52
页数:18
相关论文
共 20 条
[1]   THE RELAXATION METHOD FOR LINEAR INEQUALITIES [J].
AGMON, S .
CANADIAN JOURNAL OF MATHEMATICS-JOURNAL CANADIEN DE MATHEMATIQUES, 1954, 6 (03) :382-392
[2]  
Amaldi E, 1994, THESIS SWISS FEDERAL
[3]  
Anderson J. A., 1988, Neurocomputing: Foundations of research
[4]  
[Anonymous], 1979, Soviet Math. Dokl
[5]  
Aslam J. A., 1993, Proceedings. 34th Annual Symposium on Foundations of Computer Science (Cat. No.93CH3368-8), P282, DOI 10.1109/SFCS.1993.366859
[6]  
ASLAM JA, 1994, TR1794 HARV U
[7]  
Aspnes James., 1991, Proceedings_of_the_Twenty-Third_Annual_ACM_Symposium_on Theory_of_Computing, STOC'91, page, P402, DOI [DOI 10.1145/103418.103461.27, 10.1145/103418.103461, DOI 10.1145/103418.103461]
[8]  
Bylander T., 1994, Proceedings of the Seventh Annual ACM Conference on Computational Learning Theory, COLT 94, P340, DOI 10.1145/180139.181176
[9]  
Bylander T., 1993, Proceeding of the Sixth Annual ACM Conference on Computational Learning Theory, P297, DOI 10.1145/168304.168356
[10]   Learning noisy perceptrons by a perceptron in polynomial time [J].
Cohen, E .
38TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 1997, :514-523