Support vector machines for predicting distribution of sudden oak death in California

被引:222
作者
Guo, QH
Kelly, M
Graham, CH
机构
[1] Univ Calif Berkeley, Dept Environm Sci Policy & Management, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Museum Vertebrate Zool, Berkeley, CA 94720 USA
[3] SUNY Stony Brook, Dept Ecol & Evolut, Stony Brook, NY 11794 USA
基金
美国国家航空航天局;
关键词
geographic information systems; support vector machines; potential disease spread; sudden oak death;
D O I
10.1016/j.ecolmodel.2004.07.012
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
In the central California coastal forests, a newly discovered virulent pathogen (Phytophthora ramorum) has killed hundreds of thousands of native oak trees. Predicting the potential distribution of the disease in California remains an urgent demand of regulators and scientists. Most methods used to map potential ranges of species (e.g. multivariate or logistic regression) require both presence and absence data, the latter of which are not always feasibly collected, and thus the methods often require the generation of 'pseudo' absence data. Other methods (e.g. BIOCLIM and DOMAIN) seek to model the presence-only data directly. In this study, we present alternative methods to conventional approaches to modeling by developing support vector machines (SVMs), which are the new generation of machine learning algorithms used to find optimal separability between classes within datasets, to predict the potential distribution of Sudden Oak Death in California. We compared the performances of two types of SVMs models: two-class SVMs with 'pseudo' absence data and one-class SVMs. Both models performed well. The one-class SVMs have a slightly better true-positive rate (0.9272 +/- 0.0460 S.D.) than the two-class SVMs (0.9105 +/- 0.0712 S.D.). However, the area predicted to be at risk for the disease using the one-class SVMs (18,441 km(2)) is much larger than that of the two-class SVMs (13,828 km(2)). Both models show that the majority of disease risk will occur in coastal areas. Compared with the results of two-class SVMs, the one-class SVMs predict a potential risk in the foothills of the Sierra Nevada mountain ranges; much greater risks are also found in Los Angles and Humboldt Counties. We believe the support vector machines when coupled with geographic information system (GIS) will be a useful method to deal with presence-only data in ecological analysis over a range of scales. (C) 2004 Elsevier B.V. All rights reserved.
引用
收藏
页码:75 / 90
页数:16
相关论文
共 59 条
[1]  
[Anonymous], 2003, Statistical pattern recognition
[2]  
[Anonymous], 1999, MSRTR9987
[3]   Chagas disease in a domestic transmission cycle in southern Texas, USA [J].
Beard, CB ;
Pye, G ;
Steurer, FJ ;
Rodriguez, R ;
Campman, R ;
Peterson, AT ;
Ramsey, J ;
Wirtz, RA ;
Robinson, LE .
EMERGING INFECTIOUS DISEASES, 2003, 9 (01) :103-105
[4]  
Bian L, 1997, PHOTOGRAMM ENG REM S, V63, P161
[5]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[6]   A BIOGEOCLIMATIC ANALYSIS OF NOTHOFAGUS-CUNNINGHAMII (HOOK) OERST IN SOUTHEASTERN AUSTRALIA [J].
BUSBY, JR .
AUSTRALIAN JOURNAL OF ECOLOGY, 1986, 11 (01) :1-7
[7]   DOMAIN - A FLEXIBLE MODELING PROCEDURE FOR MAPPING POTENTIAL DISTRIBUTIONS OF PLANTS AND ANIMALS [J].
CARPENTER, G ;
GILLISON, AN ;
WINTER, J .
BIODIVERSITY AND CONSERVATION, 1993, 2 (06) :667-680
[8]  
CHANG C.C., 2001, LIBSVM LIBRARY SUPPO
[9]  
Cristianini N, 2002, AI MAG, V23, P31
[10]  
DAVIDSON J, 2001, TRANSMISSION PHYTOPH, pS108