A comparison of molecular representations for lipophilicity quantitative structure-property relationships with results from the SAMPL6 logP Prediction Challenge

被引:13
作者
Lui, Raymond [1 ]
Guan, Davy [1 ]
Matthews, Slade [1 ]
机构
[1] Univ Sydney, Fac Med & Hlth, Sch Med Sci, Pharmacoinformat Lab,Discipline Pharmacol, Sydney, NSW 2006, Australia
基金
美国国家卫生研究院;
关键词
QSPR; logP; Physicochemical properties; Machine learning; SAMPL6; ELECTROTOPOLOGICAL-STATE; PARTITION-COEFFICIENT; INDEX; MODELS;
D O I
10.1007/s10822-020-00279-0
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Effective representation of a molecule is required to develop useful quantitative structure-property relationships (QSPR) for accurate prediction of chemical properties. The octanol-water partition coefficient logP, a measure of lipophilicity, is an important property for pharmacological and toxicological endpoints used in the pharmaceutical and regulatory spheres. We compare physicochemical descriptors, structural keys, and circular fingerprints in their ability to effectively represent a chemical space and characterise molecular features to correlate with lipophilicity. Exploratory landscape continuity analyses revealed that whole-molecule physicochemical descriptors could map together compounds that were similar in both molecular features and logP, indicating higher potential for use in logP QSPRs compared to the substructural approach of structural keys and circular fingerprints. Indeed, logP QSPR models parameterised by physicochemical descriptors consistently performed with the lowest error. Our best performing model was a stochastic gradient descent-optimised multilinear regression with 1438 descriptors, returning an internal benchmark RMSE of 1.03 log units. This corroborates the well-established notion that lipophilicity is an additive, whole-molecule property. We externally tested the model by participating in the 2019 SAMPL6 logP Prediction Challenge and blindly predicting for 11 protein kinase inhibitor fragment-like molecules. Our model returned an RMSE of 0.49 log units, placing eighth overall and third in the empirical methods category (submission ID 'hdpuj'). Permutation feature importance analyses revealed that physicochemical descriptors could characterise predictive molecular features highly relevant to the kinase inhibitor fragment-like molecules.
引用
收藏
页码:523 / 534
页数:12
相关论文
共 40 条
  • [1] Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations?
    Bajusz, David
    Racz, Anita
    Heberger, Kroly
    [J]. JOURNAL OF CHEMINFORMATICS, 2015, 7
  • [2] In Silico Log P Prediction for a Large Data Set with Support Vector Machines, Radial Basis Neural Networks and Multiple Linear Regression
    Chen, Hai-Feng
    [J]. CHEMICAL BIOLOGY & DRUG DESIGN, 2009, 74 (02) : 142 - 147
  • [3] Computation of octanol-water partition coefficients by guiding an additive model with knowledge
    Cheng, Tiejun
    Zhao, Yuan
    Li, Xun
    Lin, Fu
    Xu, Yong
    Zhang, Xinglong
    Li, Yan
    Wang, Renxiao
    Lai, Luhua
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2007, 47 (06) : 2140 - 2148
  • [4] QSAR Modeling: Where Have You Been? Where Are You Going To?
    Cherkasov, Artem
    Muratov, Eugene N.
    Fourches, Denis
    Varnek, Alexandre
    Baskin, Igor I.
    Cronin, Mark
    Dearden, John
    Gramatica, Paola
    Martin, Yvonne C.
    Todeschini, Roberto
    Consonni, Viviana
    Kuz'min, Victor E.
    Cramer, Richard
    Benigni, Romualdo
    Yang, Chihae
    Rathman, James
    Terfloth, Lothar
    Gasteiger, Johann
    Richard, Ann
    Tropsha, Alexander
    [J]. JOURNAL OF MEDICINAL CHEMISTRY, 2014, 57 (12) : 4977 - 5010
  • [5] iLOGP: A Simple, Robust, and Efficient Description of n-Octanol/Water Partition Coefficient for Drug Design Using the GB/SA Approach
    Daina, Antoine
    Michielin, Olivier
    Zoete, Vincent
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2014, 54 (12) : 3284 - 3301
  • [6] MEASURES OF THE AMOUNT OF ECOLOGIC ASSOCIATION BETWEEN SPECIES
    DICE, LR
    [J]. ECOLOGY, 1945, 26 (03) : 297 - 302
  • [7] Coarse-Grained Models for Automated Fragmentation and Parametrization of Molecular Databases
    Fraaije, Johannes G. E. M.
    van Male, Jan
    Becherer, Paul
    Gracia, Ruben Serral
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2016, 56 (12) : 2361 - 2377
  • [8] NEW SUBSTITUENT CONSTANT PI DERIVED FROM PARTITION COEFFICIENTS
    FUJITA, T
    HANSCH, C
    IWASA, J
    [J]. JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 1964, 86 (23) : 5175 - &
  • [9] . Developing Collaborative QSAR Models Without Sharing Structures
    Gedeck, Peter
    Skolnik, Suzanne
    Rodde, Stephane
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2017, 57 (08) : 1847 - 1858
  • [10] Structure-activity landscape index: Identifying and quantifying activity cliffs
    Guha, Rajarshi
    Van Drie, John H.
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2008, 48 (03) : 646 - 658