Comparison of the decision tree, artificial neural network, and linear regression methods based on the number and types of independent variables and sample size

被引:72
|
作者
Kim, Yong Soo [1 ]
机构
[1] SK Telecom, CI Div, Seoul 100999, South Korea
关键词
data mining; statistical method; artificial neural network; decision tree; linear regression;
D O I
10.1016/j.eswa.2006.12.017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this article, the performance of data mining and statistical techniques was empirically compared while varying the number of independent variables, the types of independent variables, the number of classes of the independent variables, and the sample size. Our study employed 60 simulated examples, with artificial neural networks and decision trees as the data mining techniques, and linear regression as the statistical method. In the performance study, we use the RMSE value as the metric and come up with some additional findings: (i) for continuous independent variables, a statistical technique (i.e., linear regression) was superior to data mining (i.e., decision tree and artificial neural network) regardless of the number of variables and the sample size; (ii) for continuous and categorical independent variables, linear regression was best when the number of categorical variables was one, while the artificial neural network was superior when the number of categorical variables was two or more; (iii) the artificial neural network performance improved faster than that of the other methods as the number of classes of categorical variable increased. (C) 2006 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1227 / 1234
页数:8
相关论文
共 50 条
  • [1] Comparison of the decision tree, artificial neural network and multiple regression methods for prediction of carcass tissues composition of goat kids
    Ekiz, Bulent
    Baygul, Oguzhan
    Yalcintan, Hulya
    Ozcan, Mustafa
    MEAT SCIENCE, 2020, 161
  • [2] RELATIONSHIP BETWEEN SAMPLE-SIZE AND NUMBER OF VARIABLES IN A LINEAR-REGRESSION MODEL
    OLIKER, VI
    COMMUNICATIONS IN STATISTICS PART A-THEORY AND METHODS, 1978, 7 (06): : 509 - 516
  • [3] Predicting Blast-Induced Ground Vibrations in Some Indian Tunnels: a Comparison of Decision Tree, Artificial Neural Network and Multivariate Regression Methods
    Aditya Rana
    N. K. Bhagat
    G. P. Jadaun
    Saurav Rukhaiyar
    Anindya Pain
    P. K. Singh
    Mining, Metallurgy & Exploration, 2020, 37 : 1039 - 1053
  • [4] Predicting Blast-Induced Ground Vibrations in Some Indian Tunnels: a Comparison of Decision Tree, Artificial Neural Network and Multivariate Regression Methods
    Rana, Aditya
    Bhagat, N. K.
    Jadaun, G. P.
    Rukhaiyar, Saurav
    Pain, Anindya
    Singh, P. K.
    MINING METALLURGY & EXPLORATION, 2020, 37 (04) : 1039 - 1053
  • [5] Crash Severity Analysis of Highways Based on Multinomial Logistic Regression Model, Decision Tree Techniques, and Artificial Neural Network: A Modeling Comparison
    Shiran, Gholamreza
    Imaninasab, Reza
    Khayamim, Razieh
    SUSTAINABILITY, 2021, 13 (10)
  • [6] A Comparison of Estimation Methods for Missing Data in Multiple Linear Regression with Two Independent Variables
    Suraphee, Sujitta
    Raksmanee, Chancharoen
    Busaba, Jaruchat
    Chaisorn, Chanchai
    Nakornthai, Wilaiwan
    THAILAND STATISTICIAN, 2006, 4 : 13 - 26
  • [7] Regression-based classification methods and their comparison with decision tree algorithms
    Kiselev, MV
    Ananyan, SM
    Arseniev, SB
    PRINCIPLES OF DATA MINING AND KNOWLEDGE DISCOVERY, 1997, 1263 : 134 - 144
  • [8] Comparison of artificial neural network and multivariate linear regression methods for estimation of daily soil temperature in an arid region
    Hossein Tabari
    Ali-Akbar Sabziparvar
    Mohammad Ahmadi
    Meteorology and Atmospheric Physics, 2011, 110 : 135 - 142
  • [9] Comparison of artificial neural network and multivariate linear regression methods for estimation of daily soil temperature in an arid region
    Tabari, Hossein
    Sabziparvar, Ali-Akbar
    Ahmadi, Mohammad
    METEOROLOGY AND ATMOSPHERIC PHYSICS, 2011, 110 (3-4) : 135 - 142
  • [10] Review and comparison of methods to study the contribution of variables in artificial neural network models
    Gevrey, M
    Dimopoulos, L
    Lek, S
    ECOLOGICAL MODELLING, 2003, 160 (03) : 249 - 264