Robust QSAR model development in high-throughput catalyst discovery based on genetic parameter optimisation

被引:8
作者
Wang, Xue Z. [1 ]
Perston, Ben [1 ]
Yang, Yang [1 ]
Lin, Tian [2 ]
Darr, Jawwad A. [2 ]
机构
[1] Univ Leeds, Sch Proc Environm & Mat Engn, Inst Particle Sci & Engn, Leeds LS2 9JT, W Yorkshire, England
[2] UCL, Christopher Ingold Labs, Dept Chem, London WC1H 0AJ, England
基金
英国工程与自然科学研究理事会;
关键词
Quantitative structure-activity relationships (QSAR); Genetic algorithm; High throughput; Ceramics; Neural networks; Data mining; Support vector machine; Decision tree; COMBINATORIAL; DESIGN; LIBRARIES; SPACE;
D O I
10.1016/j.cherd.2009.01.013
中图分类号
TQ [化学工业];
学科分类号
0817 ;
摘要
High-throughput strategies are gaining importance in catalyst formulation and discovery. The increased experimental capacity produces valuable data from which quantitative structure-activity relationship (QSAR) models can be developed to link catalyst composition and structure with the final performance. Various QSAR modelling algorithms are available, however, they are generally configuirable and their performance is highly dependent on the correct choice of parameters. With the proliferation and increasing sophistication of integrated data-mining tools, there is a need for systematic, robust, and generic parameter optimisation methods. This paper investigates a genetic algorithm (GA) for parameter optimisation of several QSAR methods for classification and regression: including feed-forward neural networks, decision tree generators, and support vector machines, with cross-validation providing the performance estimate. The methods were applied to four datasets, including three datasets from recent reports of high-throughput studies and one from our own laboratory. The results confirm that parameter optimisation is a critical step in QSAR modelling, and demonstrate the effectiveness of the GA approach. The best results were shared among the modelling methods, emphasising the importance of considering more than one type of model. (C) 2009 The Institution of Chemical Engineers. Published by Elsevier B.V. All rights reserved.
引用
收藏
页码:1420 / 1429
页数:10
相关论文
共 36 条
[1]  
[Anonymous], 1988, Principles of Multivariate Analysis
[2]   Support vector machines for predictive modeling in heterogeneous catalysis: A comprehensive introduction and overfitting investigation based on two real applications [J].
Baumes, L. A. ;
Serra, J. M. ;
Serna, P. ;
Corma, A. .
JOURNAL OF COMBINATORIAL CHEMISTRY, 2006, 8 (04) :583-596
[3]   MAP: An iterative experimental design methodology for the optimization of catalytic search space structure modeling [J].
Baumes, LA .
JOURNAL OF COMBINATORIAL CHEMISTRY, 2006, 8 (03) :304-314
[4]  
BOLDRIN P, 2008, THESIS QUEEN MARY U
[5]  
Brereton R.G., 2003, Chemometrics, Data Analisys for the laboratory and chemical plant
[6]   In silico design in homogeneous catalysis using descriptor modelling [J].
Burello, Enrico ;
Rothenberg, Gadi .
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2006, 7 (09) :375-404
[7]   Heterogeneous combinatorial catalysis applied to oil refining, petrochemistry and fine chemistry [J].
Corma, A ;
Serra, JM .
CATALYSIS TODAY, 2005, 107-08 :3-11
[8]   A new mapping/exploration approach for HT synthesis of zeolites [J].
Corma, Avelino ;
Moliner, Manuel ;
Serra, Jose M. ;
Serna, Pedro ;
Diaz-Cabanas, Maria J. ;
Baumes, Laurent A. .
CHEMISTRY OF MATERIALS, 2006, 18 (14) :3287-3296
[9]  
Cristianini N., 2000, An Introduction to Support Vector Machines and Other Kernel-based Learning Methods, DOI DOI 10.1017/CB09780511801389
[10]   High-throughput experimentation: A powerful enabling technology for the chemicals and materials industry [J].
Dar, YL .
MACROMOLECULAR RAPID COMMUNICATIONS, 2004, 25 (01) :34-47