Optimal Piecewise Linear Regression Algorithm for QSAR Modelling

被引:18
|
作者
Cardoso-Silva, Jonathan [1 ]
Papadatos, George [2 ,3 ]
Papageorgiou, Lazaros G. [4 ]
Tsoka, Sophia [1 ]
机构
[1] Kings Coll London, Fac Nat & Math Sci, Dept Informat, Bush House, London WC2B 4BG, England
[2] European Bioinformat Inst, European Mol Biol Lab, Wellcome Trust Genome Campus, Cambridge CB10 1SD, England
[3] GlaxoSmithKline, Gunnels Wood Rd, Stevenage SG1 2NY, Herts, England
[4] UCL, Dept Chem Engn, Ctr Proc Syst Engn, Torrington Pl, London WC1E 7JE, England
基金
英国工程与自然科学研究理事会;
关键词
qsar; regression; piecewise regression; mathematical programming; integer programming; VALIDATION; CLASSIFICATION; PREDICTION; DISCOVERY;
D O I
10.1002/minf.201800028
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Quantitative Structure-Activity Relationship (QSAR) models have been successfully applied to lead optimisation, virtual screening and other areas of drug discovery over the years. Recent studies, however, have focused on the development of models that are predictive but often not interpretable. In this article, we propose the application of a piecewise linear regression algorithm, OPLRAreg, to develop both predictive and interpretable QSAR models. The algorithm determines a feature to best separate the data into regions and identifies linear equations to predict the outcome variable in each region. A regularisation term is introduced to prevent overfitting problems and implicitly selects the most informative features. As OPLRAreg is based on mathematical programming, a flexible and transparent representation for optimisation problems, the algorithm also permits customised constraints to be easily added to the model. The proposed algorithm is presented as a more interpretable alternative to other commonly used machine learning algorithms and has shown comparable predictive accuracy to Random Forest, Support Vector Machine and Random Generalised Linear Model on tests with five QSAR data sets compiled from the ChEMBL database.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Network-based piecewise linear regression for QSAR modelling
    Jonathan Cardoso-Silva
    Lazaros G. Papageorgiou
    Sophia Tsoka
    Journal of Computer-Aided Molecular Design, 2019, 33 : 831 - 844
  • [2] Network-based piecewise linear regression for QSAR modelling
    Cardoso-Silva, Jonathan
    Papageorgiou, Lazaros G.
    Tsoka, Sophia
    JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2019, 33 (09) : 831 - 844
  • [3] A learning algorithm for piecewise linear regression
    Ferrari-Trecate, G
    Muselli, M
    Liberati, D
    Morari, M
    NEURAL NETS WIRN VIETRI-01, 2002, : 114 - 119
  • [4] Optimal piecewise locally linear modelling
    Harris, CJ
    Hong, X
    Feng, M
    APPLICATIONS AND SCIENCE OF COMPUTATIONAL INTELLIGENCE II, 1999, 3722 : 486 - 493
  • [5] A DIFFERENCE OF CONVEX OPTIMIZATION ALGORITHM FOR PIECEWISE LINEAR REGRESSION
    Bagirov, Adil
    Taheri, Sona
    Asadi, Soodabeh
    JOURNAL OF INDUSTRIAL AND MANAGEMENT OPTIMIZATION, 2019, 15 (02) : 909 - 932
  • [6] An EM-Based Piecewise Linear Regression Algorithm
    Nusser, Sebastian
    Otte, Clemens
    Hauptmann, Werner
    HYBRID ARTIFICIAL INTELLIGENCE SYSTEMS, 2008, 5271 : 466 - 474
  • [7] OPTIMAL ALGORITHM FOR APPROXIMATING A PIECEWISE LINEAR FUNCTION.
    Imai, Hiroshi
    Iri, Masao
    Journal of information processing, 1986, 9 (03) : 159 - 162
  • [8] A column generation based heuristic algorithm for piecewise linear regression
    Tunc, Huseyin
    Genc, Burkay
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 171
  • [9] An algorithm for the estimation of a regression function by continuous piecewise linear functions
    Adil Bagirov
    Conny Clausen
    Michael Kohler
    Computational Optimization and Applications, 2010, 45 : 159 - 179
  • [10] An algorithm for the estimation of a regression function by continuous piecewise linear functions
    Bagirov, Adil
    Clausen, Conny
    Kohler, Michael
    COMPUTATIONAL OPTIMIZATION AND APPLICATIONS, 2010, 45 (01) : 159 - 179