Performance comparison of partial least squares-related variable selection methods for quantitative structure retention relationships modelling of retention times in reversed-phase liquid chromatography

被引:47
作者
Talebi, Mohammad [1 ]
Schuster, Georg [1 ]
Shellie, Robert A. [1 ]
Szucs, Roman [2 ]
Haddad, Paul R. [1 ]
机构
[1] Univ Tasmania, Sch Phys Sci, ACROSS, Hobart, Tas, Australia
[2] Pfizer Global Res & Dev, Sandwich, Kent, England
基金
澳大利亚研究理事会;
关键词
QSRR; Partial least squares (PLS); Genetic algorithm (GA); Molecular descriptors; RPLC; Retention time prediction; MEGAVARIATE ANALYSIS; PLS-REGRESSION; PREDICTION; QSRR; VALIDATION; STRATEGY; TOOL;
D O I
10.1016/j.chroma.2015.10.099
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The relative performance of six multivariate data analysis methods derived from or combined with partial least squares (PLS) has been compared in the context of quantitative structure-retention relationships (QSRR). These methods include, GA (genetic algorithm)-PLS, Monte Carlo uninformative variable elimination (MC-UVE), competitive adaptive reweighted sampling (CARS), iteratively retaining informative variables (IRIV), variable iterative space shrinkage approach (VISSA) and PLS with automated backward selection of predictors (autoPLS). A set of 825 molecular descriptors was computed for 86 suspected sports doping compounds and used for predicting their gradient retention times in reversed-phase liquid chromatography (RPLC). The correlation between molecular descriptors selected by each technique and the retention time was established using the PLS method. All models derived from a selected subset of descriptors outperformed the reference PLS model derived from all descriptors, with very small demands of computational time and effort. A performance comparison indicated great diversity of these methods in selecting the most relevant molecular descriptors, ranging from 28 for CARS to 263 for MC-UVE. While VISSA provided the lowest degree of over-fitting for the training set, CARS demonstrated the best compromise between the prediction accuracy and the number of selected descriptors, with the prediction error of as low as 46 s for the external test set. Only ten descriptors were found to be common for all models, with the characteristics of these descriptors being representative of the retention mechanism in RPLC. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:69 / 76
页数:8
相关论文
共 39 条
[1]  
[Anonymous], 2013, R: A language and environment for statistical computing
[2]  
[Anonymous], 2008, Handbook of Molecular Descriptors
[3]   Modelling the quality of enantiomeric separations based on molecular descriptors [J].
Caetano, S. ;
Heyden, Y. Vander .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2006, 84 (1-2) :46-55
[4]   Performance of some variable selection methods when multicollinearity is present [J].
Chong, IG ;
Jun, CH .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2005, 78 (1-2) :103-112
[5]   Modelling of UPLC behaviour of acylcarnitines by quantitative structure-retention relationships [J].
D'Archivio, Angelo Antonio ;
Maggi, Maria Anna ;
Ruggieri, Fabrizio .
JOURNAL OF PHARMACEUTICAL AND BIOMEDICAL ANALYSIS, 2014, 96 :224-230
[6]  
Dayal BS, 1997, J CHEMOMETR, V11, P73, DOI 10.1002/(SICI)1099-128X(199701)11:1<73::AID-CEM435>3.0.CO
[7]  
2-#
[8]   A novel variable selection approach that iteratively optimizes variable space using weighted binary matrix sampling [J].
Deng, Bai-chuan ;
Yun, Yong-huan ;
Liang, Yi-zeng ;
Yi, Lun-zhao .
ANALYST, 2014, 139 (19) :4836-4845
[9]   Megavariate analysis of hierarchical QSAR data [J].
Eriksson, L ;
Johansson, E ;
Lindgren, F ;
Sjöström, M ;
Wold, S .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2002, 16 (10) :711-726
[10]  
Eriksson L, 2006, Multiand megavariate data analysis: Part I Basic principles and applications, DOI [10.1002/cem.713, DOI 10.1002/CEM.713]