Machine-learning models for combinatorial catalyst discovery

被引:38
作者
Landrum, GA [1 ]
Penzotti, JE [1 ]
Putta, S [1 ]
机构
[1] Rat Discovery LLC, Palo Alto, CA 94301 USA
关键词
combinatorial chemistry; machine learning; catalysis;
D O I
10.1088/0957-0233/16/1/035
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
A variety of machine learning algorithms, including hierarchical clustering, decision trees, k-nearest neighbours, support vector machines and bagging, were applied to construct models to predict the molecular weight of the polymers produced by a set of 96 homogeneous catalysts. The goal of the study was to develop models that could be used to screen large virtual libraries of catalysts in order to suggest candidates for further synthesis and screening. The descriptors used to represent the catalysts did not require detailed information about the catalysts themselves; they could be calculated using only the topology of the ligands. Using an initial set of five descriptors, model accuracies of about 70% were observed from each learning algorithm. A larger descriptor set (with ten descriptors) allowed bag classifiers that were 80% accurate to be built. All models were carefully evaluated to detect overfitting (memorization of the training data) and one example of the effects of overfitting is provided. Because the descriptors used in this study can be calculated very rapidly and the models themselves are very efficient, these bag classifiers are well suited to screening very large virtual libraries.
引用
收藏
页码:270 / 277
页数:8
相关论文
共 43 条
  • [1] Toward quantitative prediction of stereospecificity of metallocene-based catalysts for α-olefin polymerization
    Angermund, K
    Fink, G
    Jensen, VR
    Kleinschmidt, R
    [J]. CHEMICAL REVIEWS, 2000, 100 (04) : 1457 - 1470
  • [2] [Anonymous], 2003, Statistical pattern recognition
  • [3] THE 1ST GENERAL INDEX OF MOLECULAR COMPLEXITY
    BERTZ, SH
    [J]. JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 1981, 103 (12) : 3599 - 3601
  • [4] Conformation of tripod metal templates in CH3C(CH(2)PPh(2))(3)ML(n) (n=2,3): Neural networks in conformational analysis
    Beyreuther, S
    Hunger, J
    Huttner, G
    Mann, S
    Zsolnai, L
    [J]. CHEMISCHE BERICHTE, 1996, 129 (07) : 745 - 757
  • [5] A fully integrated high-throughput screening methodology for the discovery of new polyolefin catalysts: Discovery of a new class of high temperature single-site group (IV) copolymerization catalysts
    Boussie, TR
    Diamond, GM
    Goh, C
    Hall, KA
    LaPointe, AM
    Leclerc, M
    Lund, C
    Murphy, V
    Shoemaker, JAW
    Tracht, U
    Turner, H
    Zhang, J
    Uno, T
    Rosen, RK
    Stevens, JC
    [J]. JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2003, 125 (14) : 4306 - 4317
  • [6] Box G, 1987, EMPIRICAL MODEL BUIL
  • [7] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [8] Breiman L, 1996, OUT OF BAG ESTIMATIO
  • [9] A tutorial on Support Vector Machines for pattern recognition
    Burges, CJC
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) : 121 - 167
  • [10] LIBSVM: A Library for Support Vector Machines
    Chang, Chih-Chung
    Lin, Chih-Jen
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)