On the relationship between training sample size and data dimensionality: Monte Carlo analysis of broadband multi-temporal classification

被引:150
作者
Van Niel, TG
McVicar, TR
Datt, B
机构
[1] CSIRO Land & Water, Wembley, WA 6913, Australia
[2] CSIRO Land & Water, Canberra, ACT 2601, Australia
[3] CSIRO, Earth Observat Ctr, Canberra, ACT 2601, Australia
[4] Cooperat Res Ctr Sustainable Rice Prod, Yanco, NSW 2703, Australia
关键词
crop classification; dimensionality; training sample; time-series; multi-temporal; maximum likelihood;
D O I
10.1016/j.rse.2005.08.011
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The number of training samples per class (n) required for accurate Maximum Likelihood (ML) classification is known to be affected by the number of bands (p) in the input image. However, the general rule which defines that n should be 10p to 30p is often enforced universally in remote sensing without questioning its relevance to the complexity of the specific discrimination problem. Furthermore, identifying this many training samples is often problematic when many classes and/or many bands are used. It is important, then, to test how this generally accepted rule matches common remote sensing discrimination problems because it could be unnecessarily restrictive for many applications. This study was primarily conducted in order to test whether the general rule defining the relationship between n and p was well-suited for ML classification of a relatively simple remote sensing-based discrimination problem. To summarise the mean response of n-to-p for our study site, a Monte Carlo procedure was used to randomly stack various numbers of bands into thousands of separate image combinations that were then classified using an ML algorithm. The bands were randomly selected from a 119-band Enhanced Thematic Mapper-plus (ETM+) dataset comprised of 17 images acquired during the 2001-2002 southern hemisphere summer agricultural growing season over an irrigation area in south-eastern Australia. Results showed that the number of training samples needed for accurate ML classification was much lower than the cur-rent widely accepted rule. Due to the asymptotic nature of the relationship, we found that 95% of the accuracy attained using n = 30p samples could be achieved by using approximately 2p to 4p samples, or <= 1/7th the currently recommended value of n. Our findings show that the number of training samples needed for a simple discrimination problem is much less than that defined by the general rule and therefore the rule should not be universally enforced; the number of training samples needed should also be determined by considering the complexity of the discrimination problem. (C) 2005 Elsevier Inc. All rights reserved.
引用
收藏
页码:468 / 480
页数:13
相关论文
共 37 条
  • [1] Null hypothesis testing: Problems, prevalence, and an alternative
    Anderson, DR
    Burnham, KP
    Thompson, WL
    [J]. JOURNAL OF WILDLIFE MANAGEMENT, 2000, 64 (04) : 912 - 923
  • [2] MODTRAN4 radiative transfer modeling for atmospheric correction
    Berk, A
    Anderson, GP
    Bernstein, LS
    Acharya, PK
    Dothe, H
    Matthew, MW
    Adler-Golden, SM
    Chetwynd, JH
    Richtsmeier, SC
    Pukall, B
    Allred, CL
    Jeong, LS
    Hoke, ML
    [J]. OPTICAL SPECTROSCOPIC TECHNIQUES AND INSTRUMENTATION FOR ATMOSPHERIC AND SPACE RESEARCH III, 1999, 3756 : 348 - 353
  • [3] A REVIEW OF ASSESSING THE ACCURACY OF CLASSIFICATIONS OF REMOTELY SENSED DATA
    CONGALTON, RG
    [J]. REMOTE SENSING OF ENVIRONMENT, 1991, 37 (01) : 35 - 46
  • [4] Preprocessing EO-1 Hyperion hyperspectral data to support the application of agricultural indexes
    Datt, B
    McVicar, TR
    Van Niel, TG
    Jupp, DLB
    Pearlman, JS
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2003, 41 (06): : 1246 - 1259
  • [5] Dobbertin M., 1996, CANADIAN J REMOTE SE, V22, P360, DOI 10.1080/07038992.1996.10874660
  • [6] Temporal context in floristic classification
    Fitzgerald, RW
    Lees, BG
    [J]. COMPUTERS & GEOSCIENCES, 1996, 22 (09) : 981 - 994
  • [7] An evaluation of some factors affecting the accuracy of classification by an artificial neural network
    Foody, GM
    Arora, MK
    [J]. INTERNATIONAL JOURNAL OF REMOTE SENSING, 1997, 18 (04) : 799 - 810
  • [8] THE EFFECT OF TRAINING SET SIZE AND COMPOSITION ON ARTIFICIAL NEURAL-NETWORK CLASSIFICATION
    FOODY, GM
    MCCULLOCH, MB
    YATES, WB
    [J]. INTERNATIONAL JOURNAL OF REMOTE SENSING, 1995, 16 (09) : 1707 - 1723
  • [9] Germano Joseph D., 1999, Environmental Reviews, V7, P167
  • [10] Hand D.J., 1981, DISCRIMINATION CLASS