On classification trees and random forests in corpus linguistics: Some words of caution and suggestions for improvement

被引:59
作者
Gries, Stefan Th [1 ,2 ]
机构
[1] Univ Calif Santa Barbara, Linguist, Dept Linguist, Santa Barbara, CA 93106 USA
[2] Justus Liebig Univ Giessen, Giessen, Germany
关键词
classification trees; random forests; regression modeling;
D O I
10.1515/cllt-2018-0078
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
This paper is a discussion of methodological problems that (can) arise in the analysis of multifactorial data analyzed with tree-based or forest-based classifiers in (corpus) linguistics. I showcase a data set that highlights where such methods can fail at providing optimal results and then discuss solutions to this problem as well as the interpretation of random forests more generally.
引用
收藏
页码:617 / 647
页数:31
相关论文
共 35 条
[1]  
[Anonymous], 2014, A package for R
[2]  
Baayen RH, 2013, RUSS LINGUIST, V37, P253, DOI 10.1007/s11185-013-9118-6
[3]   The dative alternation in South Asian English(es) Modelling predictors and predicting prototypes [J].
Bernaisch, Tobias ;
Gries, Stefan Th ;
Mukherjee, Joybrato .
ENGLISH WORLD-WIDE, 2014, 35 (01) :7-31
[4]   Letter to the Editor: On the term 'interaction' and related phrases in the literature on Random Forests [J].
Boulesteix, Anne-Laure ;
Janitza, Silke ;
Hapfelmeier, Alexander ;
Van Steen, Kristel ;
Strobl, Carolin .
BRIEFINGS IN BIOINFORMATICS, 2015, 16 (02) :338-345
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]  
Crawley M.J, 2013, The R Book
[7]  
Deshors Sandra C., CORPORA
[8]  
Dilts P., 2013, Modelling Phonetic Reduction in a Corpus of Spoken English Using Random Forests and Mixed-Effects Regression
[9]  
Ellis Nick C., 2016, LANGUAGE LEARNING S1, V66
[10]   CAIMAN brothers: A family of powerful classification and class modeling techniques [J].
Forina, M. ;
Casale, M. ;
Oliveri, P. ;
Lanteri, S. .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2009, 96 (02) :239-245