Comparison of statistical and machine learning methods in modelling of data with multicollinearity

被引:108
作者
Garg, Akhil [1 ]
Tai, Kang [1 ]
机构
[1] Nanyang Technol Univ, Sch Mech & Aerosp Engn, 50 Nanyang Ave, Singapore 639798, Singapore
关键词
multicollinearity; factor analysis; statistics; regression; genetic programming; artificial neural network; ANN; machine learning; principal component analysis; PCA;
D O I
10.1504/IJMIC.2013.053535
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multicollinearity occurs in a dataset due to correlation between the predictors. Models derived from such data without a check on multicollinearity may lead to erroneous system analysis. This problem can be eliminated by the selection of appropriate predictors from the dataset. Variable reduction methods like B2, B4, VIF, KIF and factor analysis (FA) can be used to overcome this problem. Such methods are useful particularly when used in conjunction with modelling methods that do not automate variable selection, such as artificial neural network (ANN) and fuzzy logic. The literature reveals that the current problem is aptly described in the field of statistics but is paid little attention in the field of machine learning. In this paper, multicollinearity is presented involving the estimation of fat content inside the body. Commonly used statistical methods such as stepwise regression, radial basis function partial least squares, partial robust M-regression, ridge regression and principal component regression are applied to this problem. The machine learning methods FA-ANN and genetic programming are also applied. The results are discussed with the interpretation and comparison of the modelling methods summarised in order to guide users on the proper techniques for tackling the multicollinearity problem.
引用
收藏
页码:295 / 312
页数:18
相关论文
共 43 条
[1]  
Affenzeller M, 2009, NUMER INSIGHT, pXXV
[2]  
Anderson-Cook C.M, 2016, RESPONSE SURFACE MET
[3]  
[Anonymous], 2010, INT MULT ENG COMP SC
[4]   Prediction of strain energy-based liquefaction resistance of sand-silt mixtures: An evolutionary approach [J].
Baziar, Mohammad H. ;
Jafarian, Yaser ;
Shahnazari, Habib ;
Movahed, Vahid ;
Tutunchian, Mohammad Amin .
COMPUTERS & GEOSCIENCES, 2011, 37 (11) :1883-1893
[5]   Fault detection and diagnosis using principal component analysis. Application to low pressure lost foam casting process [J].
Bendjama, Hocine ;
Boucherit, Mohamed Seghir ;
Bouhouche, Salah ;
Bast, Juergen .
INTERNATIONAL JOURNAL OF MODELLING IDENTIFICATION AND CONTROL, 2011, 14 (1-2) :102-111
[6]   Reasoning about nonlinear system identification [J].
Bradley, E ;
Easley, M ;
Stolle, R .
ARTIFICIAL INTELLIGENCE, 2001, 133 (1-2) :139-188
[7]   Predicting customer loyalty using the internal transactional database [J].
Buckinx, Wouter ;
Verstraeten, Geert ;
Van den Poel, Dirk .
EXPERT SYSTEMS WITH APPLICATIONS, 2007, 32 (01) :125-134
[8]   Adaptive neuro-fuzzy inference system (ANFIS): A new approach to predictive modeling in QSAR applications: A study of neuro-fuzzy modeling of PCP-based NMDA receptor antagonists [J].
Buyukbingol, Erdem ;
Sisman, Arzu ;
Akyildiz, Murat ;
Alparslan, Ferda Nur ;
Adejare, Adeboye .
BIOORGANIC & MEDICINAL CHEMISTRY, 2007, 15 (12) :4265-4282
[9]   Comparison of experimental designs for simulation-based symbolic regression of manufacturing systems [J].
Can, Birkan ;
Heavey, Cathal .
COMPUTERS & INDUSTRIAL ENGINEERING, 2011, 61 (03) :447-462
[10]  
Castillo F, 2006, GECCO 2006: GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, VOL 1 AND 2, P1613