An Introduction to Machine Learning for Panel Data

被引:13
作者
Chen, James Ming [1 ,2 ]
机构
[1] Michigan State Univ, Justin Smith Morrill Chair Law, E Lansing, MI 48824 USA
[2] Silver Leaf Capital LLC, New York, NY 91324 USA
关键词
Machine learning; Bias-variance tradeoff; Decision trees; Random forests; Extra trees; XGBoost; Learning ensembles; Boosting; Support vector machines; Neural networks;
D O I
10.1007/s11294-021-09815-6
中图分类号
F [经济];
学科分类号
02 ;
摘要
Machine learning has dramatically expanded the range of tools for evaluating economic panel data. This paper applies a variety of machine-learning methods to the Boston housing dataset, an iconic proving ground for machine learning. Though machine learning often lacks the overt interpretability of linear regression, methods based on decision trees score the relative importance of dataset features. In addition to addressing the theoretical tradeoff between bias and variance, this paper discusses practices rarely followed in traditional economics: the splitting of data into training, validation, and test sets; the scaling of data; and the preference for retaining all data. The choice between traditional and machine-learning methods hinges on practical rather than mathematical considerations. In settings emphasizing interpretative clarity through the scale and sign of regression coefficients, machine learning may best play an ancillary role. Wherever predictive accuracy is paramount, however, or where heteroskedasticity or high dimensionality might impair the clarity of linear methods, machine learning can deliver superior results.
引用
收藏
页码:1 / 16
页数:16
相关论文
共 47 条
  • [1] Machine Learning Methods That Economists Should Know About
    Athey, Susan
    Imbens, Guido W.
    [J]. ANNUAL REVIEW OF ECONOMICS, VOL 11, 2019, 2019, 11 : 685 - 725
  • [2] Scaling to very very large corpora for natural language disambiguation
    Banko, M
    Brill, E
    [J]. 39TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2001, : 26 - 33
  • [3] Support vector clustering
    Ben-Hur, A
    Horn, D
    Siegelmann, HT
    Vapnik, V
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2002, 2 (02) : 125 - 137
  • [4] SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation
    Blewitt, Marnie E.
    Gendrel, Anne-Valerie
    Pang, Zhenyi
    Sparrow, Duncan B.
    Whitelaw, Nadia
    Craig, Jeffrey M.
    Apedaile, Anwyn
    Hilton, Douglas J.
    Dunwoodie, Sally L.
    Brockdorff, Neil
    Kay, Graham F.
    Whitelaw, Emma
    [J]. NATURE GENETICS, 2008, 40 (05) : 663 - 669
  • [5] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [6] Statistical modeling: The two cultures
    Breiman, L
    [J]. STATISTICAL SCIENCE, 2001, 16 (03) : 199 - 215
  • [7] Breiman L, 1998, ANN STAT, V26, P801
  • [8] Pasting small votes for classification in large databases and on-line
    Breiman, L
    [J]. MACHINE LEARNING, 1999, 36 (1-2) : 85 - 103
  • [9] Breiman L., 1998, Ann. Prob, P1683
  • [10] Bullard RobertD., 2001, PHYLON, V49, P151, DOI [10.2307/3132626, DOI 10.2307/3132626]