Trait-based risk assessment for invasive species: high performance across diverse taxonomic groups, geographic ranges and machine learning/statistical tools

被引:34
作者
Keller, Reuben P. [1 ]
Kocev, Dragi [2 ]
Dzeroski, Saso [2 ]
机构
[1] Univ Chicago, Program Global Environm, Chicago, IL 60637 USA
[2] Jozef Stefan Inst, Dept Knowledge Technol, Ljubljana, Slovenia
关键词
Artificial intelligence; biological invasions; logistic regression; machine learning; random forests; receiver-operator curve; traits; SUCCESS; FISHES; PINUS; SIZE;
D O I
10.1111/j.1472-4642.2011.00748.x
中图分类号
X176 [生物多样性保护];
学科分类号
090705 ;
摘要
Aim Trait-based risk assessment for invasive species is becoming an important tool for identifying non-indigenous species that are likely to cause harm. Despite this, concerns remain that the invasion process is too complex for accurate predictions to be made. Our goal was to test risk assessment performance across a range of taxonomic and geographical scales, at different points in the invasion process, with a range of statistical and machine learning algorithms. Location Regional to global data sets. Methods We selected six data sets differing in size, geography and taxonomic scope. For each data set, we created seven risk assessment tools using a range of statistical and machine learning algorithms. Performance of tools was compared to determine the effects of data set size and scale, the algorithm used, and to determine overall performance of the trait-based risk assessment approach. Results Risk assessment tools with good performance were generated for all data sets. Random forests (RF) and logistic regression (LR) consistently produced tools with high performance. Other algorithms had varied performance. Despite their greater power and flexibility, machine learning algorithms did not systematically outperform statistical algorithms. Geographic scope of the data set, and size of the data set, did not systematically affect risk assessment performance. Main conclusions Across six representative data sets, we were able to create risk assessment tools with high performance. Additional data sets could be generated for other taxonomic groups and regions, and these could support efforts to prevent the arrival of new invaders. Random forests and LR approaches performed well for all data sets and could be used as a standard approach to risk assessment development.
引用
收藏
页码:451 / 461
页数:11
相关论文
共 39 条
[1]  
AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
[2]  
Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[5]  
Caruana R, 2006, ICML 06: proceedings of the 23rd International Conference on Machine Learning, P161, DOI [DOI 10.1145/1143844.1143865, 10.1145/1143844.1143865.]
[6]  
Cohen W.W., 1995, P 12 INT C MACH LEAR, P115, DOI [10.1016/b978-1-55860-377-6.50023-2, DOI 10.1016/B978-1-55860-377-6.50023-2]
[7]  
De'ath G, 2000, ECOLOGY, V81, P3178, DOI 10.1890/0012-9658(2000)081[3178:CARTAP]2.0.CO
[8]  
2
[9]  
Demsar J, 2006, J MACH LEARN RES, V7, P1
[10]   Ensemble methods in machine learning [J].
Dietterich, TG .
MULTIPLE CLASSIFIER SYSTEMS, 2000, 1857 :1-15