Use of genetic algorithms to select input variables in decision tree models for the prediction of benthic macroinvertebrates

被引:73
作者
D'heygere, T [1 ]
Goethals, PLM [1 ]
De Pauw, N [1 ]
机构
[1] Univ Ghent, Lab Environm Toxicol & Aquat Ecol, B-9000 Ghent, Belgium
关键词
benthic macroinvertebrates; predictive models; genetic algorithm; decision trees; physical-chemical-; ecotoxicological-; structural variables;
D O I
10.1016/S0304-3800(02)00260-0
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
Predicting freshwater organisms based on machine learning is becoming more and more reliable due to the availability of appropriate datasets, advanced modelling techniques and the continuously increasing capacity of computers. A database consisting of measurements collected at 360 sampling sites in non-navigable watercourses in Flanders was applied to predict the absence/presence of benthic macroinvertebrate taxa by means of decision trees. The measured variables were a combination of physical-chemical (temperature, pH, dissolved oxygen concentration, conductivity, total organic carbon, Kjeldahl nitrogen and total phosphorus), structural (granulometric analysis of the sediment, width, depth and flow velocity of the river) and two ecotoxicological variables. The predictive power of decision trees was assessed on the basis of the number of Correctly Classified Instances (CCI). A genetic algorithm was introduced to compare the predictive power of different sets of input variables for the decision trees. The number of input variables was reduced from 15 to 2-8 variables without affecting the predictive power of the decision trees significantly. Furthermore, reducing the number of input variables allowed to ease the identification of general data trends. (C) 2002 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:291 / 300
页数:10
相关论文
共 30 条
[11]   Time series forecasting with neural networks: A comparative study using the airline data [J].
Faraway, J ;
Chatfield, C .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 1998, 47 :231-250
[12]   A review of methods for the assessment of prediction errors in conservation presence/absence models [J].
Fielding, AH ;
Bell, JF .
ENVIRONMENTAL CONSERVATION, 1997, 24 (01) :38-49
[13]  
GABRIELS W, 2002, ECOLOGICAL INFORMATI, P203
[14]  
GABRIELS W, 2002, IN PRESS VERH INT VE, V28, P4
[15]  
Goethals Peter, 2001, Journal of Limnology, V60, P7
[16]  
Goldberg D.E., 1989, Genetic Algorithms in Search, Optimization and Machine Learning, V1st, P412
[17]  
Holland J., 1992, ADAPTATION NATURAL A
[18]  
Huan Liu, 1996, Machine Learning. Proceedings of the Thirteenth International Conference (ICML '96), P319
[19]  
Jongman R. H. G., 1995, DATA ANAL COMMUNITY, P299, DOI [DOI 10.1017/CBO9780511525575, 10.1017/CBO9780511525575]
[20]   FORECASTING FUTURES TRADING VOLUME USING NEURAL NETWORKS [J].
KAASTRA, I ;
BOYD, MS .
JOURNAL OF FUTURES MARKETS, 1995, 15 (08) :953-970