Analysis of NIR spectroscopic data using decision trees and their ensembles

被引:25
作者
Kucheryavskiy S. [1 ]
机构
[1] Department of Chemistry and Bioscience, Aalborg University, Niels Bohrs vej, 8, Esbjerg
关键词
Classification and regression trees; Decision trees; NIR spectroscopy; Random forests;
D O I
10.1007/s41664-018-0078-0
中图分类号
学科分类号
摘要
Decision trees and their ensembles became quite popular for data analysis during the past decade. One of the main reasons for that is current boom in big data, where traditional statistical methods (such as, e.g., multiple linear regression) are not very efficient. However, in chemometrics these methods are still not very widespread, first of all because of several limitations related to the ratio between number of variables and observations. This paper presents several examples on how decision trees and their ensembles can be used in analysis of NIR spectroscopic data both for regression and classification. We will try to consider all important aspects including optimization and validation of models, evaluation of results, treating missing data and selection of most important variables. The performance and outcome of the decision tree-based methods are compared with more traditional approach based on partial least squares. © 2018, The Nonferrous Metals Society of China.
引用
收藏
页码:274 / 289
页数:15
相关论文
共 24 条
[1]  
Kotsiantis S.B., Decision trees: a recent overview, Artif Intell Rev, 39, pp. 261-283, (2013)
[2]  
Breiman L., Friedman J., Stone C.J., Olshen R.A., Classification and regression trees, (1984)
[3]  
Kegelmeyer W., Banfield R.E., Hall L.O., Bowyer K.W., A comparison of decision tree ensemble creation techniques, IEEE Trans Pattern Anal Mach Intell, 29, pp. 173-180, (2007)
[4]  
Breiman L., Random forests, Mach Learn, 45, pp. 5-32, (2001)
[5]  
Friedman J.H., Stochastic gradient boosting, Comput Stat Data Anal, 38, pp. 367-378, (2002)
[6]  
Chan J.C.-W., Paelinckx D., Evaluation of random forest and Adaboost tree-based ensemble classification and spectral band selection for ecotope mapping using airborne hyperspectral imagery, Remote Sens Environ, 112, pp. 2999-3011, (2008)
[7]  
Menze B.H., Kelm B.M., Masuch R., Himmelreich U., Bachert P., Petrich W., Hamprecht F.A., A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data, BMC Bioinform, 10, (2009)
[8]  
Mu K.-X., Feng Y.-Z., Chen W., Yu W., Near infrared spectroscopy for classification of bacterial pathogen strains based on spectral transforms and machine learning, Chemom Intell Lab Syst, 179, pp. 46-53, (2018)
[9]  
Douglas R.K., Nawar S., Cipullo S., Alamar M.C., Coulon F., Mouazen A.M., Evaluation of vis-NIR reflectance spectroscopy sensitivity to weathering for enhanced assessment of oil contaminated soils, Sci Total Environ, 626, pp. 1108-1120, (2018)
[10]  
R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, (2018)