Tree Species Abundance Predictions in a Tropical Agricultural Landscape with a Supervised Classification Model and Imbalanced Data

被引:63
作者
Graves, Sarah J. [1 ]
Asner, Gregory P. [2 ]
Martin, Roberta E. [2 ]
Anderson, Christopher B. [2 ]
Colgan, Matthew S. [2 ]
Kalantari, Leila [3 ]
Bohlman, Stephanie A. [1 ,4 ]
机构
[1] Univ Florida, Sch Forest Resources & Conservat, POB 11041, Gainesville, FL 32611 USA
[2] Carnegie Inst Sci, Dept Global Ecol, 260 Panama St, Stanford, CA 94305 USA
[3] Univ Florida, Dept Comp & Informat Sci & Engn, POB 116120, Gainesville, FL 32611 USA
[4] Smithsonian Trop Res Inst, Apartado 0843-03092, Balboa, Ancon, Panama
关键词
Support Vector Machine; imaging spectroscopy; class imbalance; tropics; agriculture; operational species mapping; SUPPORT VECTOR MACHINE; IMAGING SPECTROSCOPY; ESTIMATING AREA; LIDAR DATA; ACCURACY; ERROR; BIODIVERSITY; SCIENCE; IMAGERY;
D O I
10.3390/rs8020161
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Mapping species through classification of imaging spectroscopy data is facilitating research to understand tree species distributions at increasingly greater spatial scales. Classification requires a dataset of field observations matched to the image, which will often reflect natural species distributions, resulting in an imbalanced dataset with many samples for common species and few samples for less common species. Despite the high prevalence of imbalanced datasets in multiclass species predictions, the effect on species prediction accuracy and landscape species abundance has not yet been quantified. First, we trained and assessed the accuracy of a support vector machine (SVM) model with a highly imbalanced dataset of 20 tropical species and one mixed-species class of 24 species identified in a hyperspectral image mosaic (350-2500 nm) of Panamanian farmland and secondary forest fragments. The model, with an overall accuracy of 62% +/- 2.3% and F-score of 59% +/- 2.7%, was applied to the full image mosaic (23,000 ha at a 2-m resolution) to produce a species prediction map, which suggested that this tropical agricultural landscape is more diverse than what has been presented in field-based studies. Second, we quantified the effect of class imbalance on model accuracy. Model assessment showed a trend where species with more samples were consistently over predicted while species with fewer samples were under predicted. Standardizing sample size reduced model accuracy, but also reduced the level of species over- and under-prediction. This study advances operational species mapping of diverse tropical landscapes by detailing the effect of imbalanced data on classification accuracy and providing estimates of tree species abundance in an agricultural landscape. Species maps using data and methods presented here can be used in landscape analyses of species distributions to understand human or environmental effects, in addition to focusing conservation efforts in areas with high tree cover and diversity.
引用
收藏
页数:21
相关论文
共 60 条
[21]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[22]   Tree species classification in the Southern Alps based on the fusion of very high geometrical resolution multispectral/hyperspectral images and LiDAR data [J].
Dalponte, Michele ;
Bruzzone, Lorenzo ;
Gianelle, Damiano .
REMOTE SENSING OF ENVIRONMENT, 2012, 123 :258-270
[23]   Semi-Supervised Methods to Identify Individual Crowns of Lowland Tropical Canopy Species Using Imaging Spectroscopy and LiDAR [J].
Feret, Jean-Baptiste ;
Asner, Gregory P. .
REMOTE SENSING, 2012, 4 (08) :2457-2476
[24]   Toward intelligent training of supervised image classifications: directing training data acquisition for SVM classification [J].
Foody, GM ;
Mathur, A .
REMOTE SENSING OF ENVIRONMENT, 2004, 93 (1-2) :107-117
[25]  
Fret J. B., 2012, IEEE T GEOSCI REMOTE, V99, P1
[26]   Learning from Imbalanced Data [J].
He, Haibo ;
Garcia, Edwardo A. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (09) :1263-1284
[27]   Benefits of hyperspectral remote sensing for tracking plant invasions [J].
He, Kate S. ;
Rocchini, Duccio ;
Neteler, Markus ;
Nagendra, Harini .
DIVERSITY AND DISTRIBUTIONS, 2011, 17 (03) :381-392
[28]   Assessing the utility of airborne hyperspectral and LiDAR data for species distribution mapping in the coastal Pacific Northwest, Canada [J].
Jones, Trevor G. ;
Coops, Nicholas C. ;
Sharma, Tara .
REMOTE SENSING OF ENVIRONMENT, 2010, 114 (12) :2841-2852
[29]   A kernel functions analysis for support vector machines for land cover classification [J].
Kavzoglu, T. ;
Colkesen, I. .
INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2009, 11 (05) :352-359
[30]   Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap [J].
Kim, Ji-Hyun .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2009, 53 (11) :3735-3745