SMALL SAMPLE-SIZE EFFECTS IN STATISTICAL PATTERN-RECOGNITION - RECOMMENDATIONS FOR PRACTITIONERS

被引:916
作者
RAUDYS, SJ [1 ]
JAIN, AK [1 ]
机构
[1] MICHIGAN STATE UNIV, DEPT COMP SCI, E LANSING, MI 48824 USA
关键词
CLASSIFICATION ERROR; CLASSIFIER DESIGN; CURSE OF DIMENSIONALITY; FEATURE SELECTION; STATISTICAL PATTERN RECOGNITION; TEST SAMPLES; TRAINING SAMPLES;
D O I
10.1109/34.75512
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
During the last two decades a considerable amount of effort has been devoted to the analysis of the influence of both training and testing sample size on the design and performance of pattern recognition systems. These questions are interesting to practitioners as well as theoreticians, because the small-sample effects can easily contaminate the design and evaluation of a proposed system. For applications with a large number of features and a complex classification rule, the training sample size must be quite large. A large test sample is required to accurately evaluate a classifier with a low error rate. The design of a pattern recognition system consists of several stages: data collection, formation of the pattern classes, feature selection, specification of the classification algorithm, and estimation of the classification error. In this paper, we will discuss the effects of sample size on feature selection and error estimation for several types of classifier. In addition to surveying prior work in this area, our emphasis is on giving practical advice to today's designers and users of statistical pattern recognition systems.
引用
收藏
页码:252 / 264
页数:13
相关论文
共 57 条
[1]  
ABUSEV RA, 1980, THEOR PROBAB APPL+, V25, P377, DOI 10.1137/1125048
[2]  
AIVAZIAN SA, 1989, FINANSY STATISTIKA
[3]  
Batchelor B. G., 1976, 3rd International Joint Conference on Pattern Recognition, P315
[4]  
Ben-Bassat M., 1982, HDB STATISTICS, V2, P773, DOI DOI 10.1016/S0169-7161(82)02038-0
[5]  
Breiman L, 2017, CLASSIFICATION REGRE, P368, DOI 10.1201/9781315139470
[6]  
BROFFITT YD, 1982, HDB STATISTICS, V2, P139
[7]  
Chandrasekaran B., 1979, Journal of Cybernetics and Information Science, V2, P12
[8]  
Devroye L., 1982, HDB STAT, P193
[9]  
Duda R. O., 1973, PATTERN CLASSIFICATI, V3
[10]   EFFICIENCY OF LOGISTIC REGRESSION COMPARED TO NORMAL DISCRIMINANT-ANALYSIS [J].
EFRON, B .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1975, 70 (352) :892-898