Partial least squares-discriminant analysis (PLS-DA) for classification of high-dimensional (HD) data: a review of contemporary practice strategies and knowledge gaps

被引:544
作者
Lee, Loong Chuen [1 ,2 ]
Liong, Choong-Yeun [2 ]
Jemain, Abdul Aziz [2 ]
机构
[1] Univ Kebangsaan Malaysia, FSK, Forens Sci Programme, Jalan Raja Muda Abdul Aziz, Kuala Lumpur 50300, Malaysia
[2] Univ Kebangsaan Malaysia, FST, Stat Programme, Bangi 43600, Selangor, Malaysia
关键词
NUCLEAR-MAGNETIC-RESONANCE; UV-VIS SPECTROSCOPY; BLUE PEN INKS; VARIABLE SELECTION; RAMAN-SPECTROSCOPY; GENETIC ALGORITHM; DATA FUSION; NONDESTRUCTIVE IDENTIFICATION; VIBRATIONAL SPECTROSCOPY; VARIETAL CLASSIFICATION;
D O I
10.1039/c8an00599k
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Partial least squares-discriminant analysis (PLS-DA) is a versatile algorithm that can be used for predictive and descriptive modelling as well as for discriminative variable selection. However, versatility is both a blessing and a curse and the user needs to optimize a wealth of parameters before reaching reliable and valid outcomes. Over the past two decades, PLS-DA has demonstrated great success in modelling high-dimensional datasets for diverse purposes, e.g. product authentication in food analysis, diseases classification in medical diagnosis, and evidence analysis in forensic science. Despite that, in practice, many users have yet to grasp the essence of constructing a valid and reliable PLS-DA model. As the technology progresses, across every discipline, datasets are evolving into a more complex form, i.e. multi-class, imbalanced and colossal. Indeed, the community is welcoming a new era called big data. In this context, the aim of the article is two-fold: (a) to review, outline and describe the contemporary PLS-DA modelling practice strategies, and (b) to critically discuss the respective knowledge gaps that have emerged in response to the present big data era. This work could complement other available reviews or tutorials on PLS-DA, to provide a timely and user-friendly guide to researchers, especially those working in applied research.
引用
收藏
页码:3526 / 3539
页数:14
相关论文
共 160 条
[1]   Chemometrics comes to court: evidence evaluation of chem-bio threat agent attacks [J].
Ahlinder, Jon ;
Nordgaard, Anders ;
Lindstrom, Susanne W. .
JOURNAL OF CHEMOMETRICS, 2015, 29 (05) :267-276
[2]   Validation of multivariate classification methods using analytical fingerprints - concept and case study on organic feed for laying hens [J].
Alewijn, Martin ;
van der Voet, Hilko ;
van Ruth, Saskia .
JOURNAL OF FOOD COMPOSITION AND ANALYSIS, 2016, 51 :15-23
[3]   Classification of gas chromatographic fingerprints of saffron using partial least squares discriminant analysis together with different variable selection methods [J].
Aliakbarzadeh, Ghazaleh ;
Parastar, Hadi ;
Sereshti, Hassan .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2016, 158 :165-173
[4]   Potential of NIR spectroscopy for predicting internal quality and discriminating among strawberry fruits from different production systems [J].
Amodio, Maria Luisa ;
Ceglie, Francesco ;
Chaudhry, Muhammad Mudassir Arif ;
Piazzolla, Francesca ;
Colelli, Giancarlo .
POSTHARVEST BIOLOGY AND TECHNOLOGY, 2017, 125 :112-121
[5]   Variable selection in regression-a tutorial [J].
Andersen, C. M. ;
Bro, R. .
JOURNAL OF CHEMOMETRICS, 2010, 24 (11-12) :728-737
[6]  
[Anonymous], 2011, Statistical Pattern Recognition
[7]   A survey of cross-validation procedures for model selection [J].
Arlot, Sylvain ;
Celisse, Alain .
STATISTICS SURVEYS, 2010, 4 :40-79
[8]   Chemometric application in foodomics: Nutritional quality parameters evaluation in milk-based infant formula [J].
Azcarate, S. M. ;
Gil, R. ;
Smichowski, P. ;
Savio, M. ;
Camina, J. M. .
MICROCHEMICAL JOURNAL, 2017, 130 :1-6
[9]   Classification tools in chemistry. Part 1: linear models. PLS-DA [J].
Ballabio, Davide ;
Consonni, Viviana .
ANALYTICAL METHODS, 2013, 5 (16) :3790-3798
[10]   Partial least squares for discrimination [J].
Barker, M ;
Rayens, W .
JOURNAL OF CHEMOMETRICS, 2003, 17 (03) :166-173