Interpretation of nonlinear relationships between process variables by use of random forests

被引:176
作者
Auret, Lidia [2 ]
Aldrich, Chris [1 ]
机构
[1] Curtin Univ Technol, Western Australian Sch Mines, Perth, WA 6845, Australia
[2] Univ Stellenbosch, Dept Proc Engn, ZA-7602 Stellenbosch, South Africa
关键词
Modelling; Pyrometallurgy; Comminution; MODELS;
D O I
10.1016/j.mineng.2012.05.008
中图分类号
TQ [化学工业];
学科分类号
0817 ;
摘要
Better understanding of process phenomena is dependent on the interpretation of models capturing the relationships between the process variables. Although linear regression is used routinely in the mineral process industries for this purpose, it may not be useful where the relationships between variables are nonlinear or complex. Under these circumstances, nonlinear methods, such as neural networks or decision trees can be used to develop reliable models, without necessarily giving any particular or explicit insight into the relationships between the process and the target variables. This is a major drawback in situations where such information would be very important, such as in fault identification or gaining a better understanding of the fundamentals of a process. In this paper, the use of variable importance measures and partial dependency plots generated by random forest models are proposed as a practical tool that can be used to surmount this problem. In particular, it is shown that important variables can be flagged by appropriate threshold generated by inclusion of dummy variables in the system. Moreover, the results of the study indicate that random forest models can reliably identify the influence of individual variables, even in the presence of high levels of additive noise. This would make it a useful tool in continuous process improvement and root cause analysis of abnormal process behaviour. (C) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:27 / 42
页数:16
相关论文
共 30 条
[21]   Two empirical hydrocyclone models revisited [J].
Nageswararao, K ;
Wiseman, DM ;
Napier-Munn, TJ .
MINERALS ENGINEERING, 2004, 17 (05) :671-687
[22]   Predictor correlation impacts machine learning algorithms: implications for genomic studies [J].
Nicodemus, Kristin K. ;
Malley, James D. .
BIOINFORMATICS, 2009, 25 (15) :1884-1890
[23]   Using an algorithmic model to reveal individually variable movement decisions in a wintering sea duck [J].
Oppel, Steffen ;
Powell, Abby N. ;
Dickson, D. Lynne .
JOURNAL OF ANIMAL ECOLOGY, 2009, 78 (03) :524-531
[24]  
Pournelle G. H., 1953, Journal of Mammalogy, V34, P133, DOI 10.1890/0012-9658(2002)083[1421:SDEOLC]2.0.CO
[25]  
2
[26]   STATISTICAL MODELING OF A SHAKING TABLE SEPARATOR .1. [J].
RAZALI, R ;
VEASEY, TJ .
MINERALS ENGINEERING, 1990, 3 (3-4) :287-294
[27]   A framework to identify physiological responses in microarray-based gene expression studies: selection and interpretation of biologically relevant genes [J].
Rodenburg, Wendy ;
Heidema, A. Geert ;
Boer, Jolanda M. A. ;
Bovee-Oudenhoven, Ingeborg M. J. ;
Feskens, Edith J. M. ;
Mariman, Edwin C. M. ;
Keijer, Jaap .
PHYSIOLOGICAL GENOMICS, 2008, 33 (01) :78-90
[28]   A NONLINEAR MAPPING FOR DATA STRUCTURE ANALYSIS [J].
SAMMON, JW .
IEEE TRANSACTIONS ON COMPUTERS, 1969, C 18 (05) :401-&
[29]  
Strobl C., 2008, P 18 INT C COMPUTATI, V2, P59
[30]   Conditional variable importance for random forests [J].
Strobl, Carolin ;
Boulesteix, Anne-Laure ;
Kneib, Thomas ;
Augustin, Thomas ;
Zeileis, Achim .
BMC BIOINFORMATICS, 2008, 9 (1)