Selection of input variables for data driven models: An average shifted histogram partial mutual information estimator approach

被引:133
作者
Fernando, T. M. K. G. [1 ]
Maier, H. R. [1 ]
Dandy, G. C. [1 ]
机构
[1] Univ Adelaide, Sch Civil Environm & Min Engn, Adelaide, SA, Australia
基金
澳大利亚研究理事会;
关键词
Artificial neural networks; Input selection; Average shifted histograms; Mutual information; RAINFALL PROBABILISTIC FORECASTS; WATER-RESOURCES APPLICATIONS; ARTIFICIAL NEURAL-NETWORKS; BANDWIDTH SELECTION; SUPPLY MANAGEMENT; IDENTIFICATION; PREDICTION; SALINITY;
D O I
10.1016/j.jhydrol.2008.10.019
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
The use of artificial neural networks (ANNs) for the modelling of water resources variables has increased rapidly in recent years. This paper addresses one of the important issues associated with artificial neural network model development; input variable selection. In this study, the partial mutual information (PMI) input selection algorithm is modified to increase its computational efficiency, while maintaining its accuracy. As part of the modification, use of average shifted histograms (ASHs) is introduced as an alternative to kernel based methods for the estimation of mutual information (M1). Empirical guidelines are developed to estimate the key ASH parameters as a function of sample size. The stopping criterion used with the original PMI algorithm is replaced with a more computationally efficient outlier detection technique based on the Hampel distance. The performance of the proposed PMI algorithm, in terms of computational efficiency and input selection accuracy, is first investigated by using it to identify significant variables for data series where dependencies of attributes are known a priori. The proposed ASH PMI input variable selection algorithm with the Hampel distance stopping criterion consistently selects the correct inputs, while being computationally efficient. The modified PMI algorithm is then applied to identify suitable inputs to forecast salinity in the River Murray at Murray Bridge, South Australia, with a lead time of 14 days using an ANN approach. The ANN models developed with the inputs selected with the modified PMI algorithm perform very well when compared with results obtained using ANN models with different input sets developed in previous studies. Furthermore, the proposed input variable selection algorithm results in more parsimonious ANN models. (C) 2008 Elsevier B.V. All rights reserved.
引用
收藏
页码:165 / 176
页数:12
相关论文
共 34 条
[1]  
[Anonymous], 1999, Bootstrap methods: A practitioners guide
[2]  
BACK AD, 1999, INT J C NEUR NETW, V2, P989
[3]   Input determination for neural network models in water resources applications. Part 1 - background and methodology [J].
Bowden, GJ ;
Dandy, GC ;
Maier, HR .
JOURNAL OF HYDROLOGY, 2005, 301 (1-4) :75-92
[4]   Input determination for neural network models in water resources applications. Part 2. Case study: forecasting salinity in a river [J].
Bowden, GJ ;
Maier, HR ;
Dandy, GC .
JOURNAL OF HYDROLOGY, 2005, 301 (1-4) :93-107
[5]   Optimal division of data for neural network models in water resources applications [J].
Bowden, GJ ;
Maier, HR ;
Dandy, GC .
WATER RESOURCES RESEARCH, 2002, 38 (02) :2-1
[6]  
Chiu ST, 1996, STAT SINICA, V6, P129
[7]   OPTIMUM OPERATION OF A MULTIPLE RESERVOIR SYSTEM INCLUDING SALINITY EFFECTS [J].
DANDY, G ;
CRAWLEY, P .
WATER RESOURCES RESEARCH, 1992, 28 (04) :979-990
[8]   Estimation of the information by an adaptive partitioning of the observation space [J].
Darbellay, GA ;
Vajda, I .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1999, 45 (04) :1315-1321
[9]   THE IDENTIFICATION OF MULTIPLE OUTLIERS [J].
DAVIES, L ;
GATHER, U .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1993, 88 (423) :782-792
[10]   Hydrological modelling using artificial neural networks [J].
Dawson, CW ;
Wilby, RL .
PROGRESS IN PHYSICAL GEOGRAPHY-EARTH AND ENVIRONMENT, 2001, 25 (01) :80-108