Modelling methods and cross-validation variants in QSAR: a multi-level analysis

被引:40
作者
Racz, A. [1 ]
Bajusz, D. [2 ]
Heberger, K. [1 ]
机构
[1] Hungarian Acad Sci, Res Ctr Nat Sci, Plasma Chem Res Grp, Budapest, Hungary
[2] Hungarian Acad Sci, Res Ctr Nat Sci, Med Chem Res Grp, Budapest, Hungary
关键词
QSAR; toxicity; validation; SRD; cross-validation; MLR; PLS; PCR; SVM; ANN; BENZENE-DERIVATIVES; BIOLOGICAL-ACTIVITY; RANKING DIFFERENCES; PREDICTION; PARAMETERS; TOXICITY; PRINCIPLES; INHIBITORS; REGRESSION; 3D-QSAR;
D O I
10.1080/1062936X.2018.1505778
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Prediction performance often depends on the cross- and test validation protocols applied. Several combinations of different cross-validation variants and model-building techniques were used to reveal their complexity. Two case studies (acute toxicity data) were examined, applying five-fold cross-validation (with random, contiguous and Venetian blind forms) and leave-one-out cross-validation (CV). External test sets showed the effects and differences between the validation protocols. The models were generated with multiple linear regression (MLR), principal component regression (PCR), partial least squares (PLS) regression, artificial neural networks (ANN) and support vector machines (SVM). The comparisons were made by the sum of ranking differences (SRD) and factorial analysis of variance (ANOVA). The largest bias and variance could be assigned to the MLR method and contiguous block cross-validation. SRD can provide a unique and unambiguous ranking of methods and CV variants. Venetian blind cross-validation is a promising tool. The generated models were also compared based on their basic performance parameters (r(2) and Q(2)). MLR produced the largest gap, while PCR gave the smallest. Although PCR is the best validated and balanced technique, SVM always outperformed the other methods, when experimental values were the benchmark. Variable selection was advantageous, and the modelling had a larger influence than CV variants.
引用
收藏
页码:661 / 674
页数:14
相关论文
共 35 条
  • [1] 3D-QSAR studies on Maslinic acid analogs for Anticancer activity against Breast Cancer cell line MCF-7
    Alam, Sarfaraz
    Khan, Feroz
    [J]. SCIENTIFIC REPORTS, 2017, 7
  • [2] [Anonymous], 2017, QIKPROP REL 2017 4
  • [3] Bajusz D, 2017, COMPREHENSIVE MEDICINAL CHEMISTRY III, VOL 3: IN SILICO DRUG DISCOVERY TOOLS, P329, DOI 10.1016/B978-0-12-409547-2.12345-5
  • [4] Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations?
    Bajusz, David
    Racz, Anita
    Heberger, Kroly
    [J]. JOURNAL OF CHEMINFORMATICS, 2015, 7
  • [5] Bertinetto C, 2013, MATCH-COMMUN MATH CO, V70, P1005
  • [6] QSAR model for prediction of the therapeutic potency of N-benzylpiperidine derivatives as AChE inhibitors
    Bitam, S.
    Hamadache, M.
    Hanini, S.
    [J]. SAR AND QSAR IN ENVIRONMENTAL RESEARCH, 2017, 28 (06) : 471 - 489
  • [7] Prediction of Acute Aquatic Toxicity Toward Daphnia magna by using the GA-kNN Method
    Cassotti, Matteo
    Ballabio, Davide
    Consonni, Viviana
    Mauri, Andrea
    Tetko, Igor V.
    Todeschini, Roberto
    [J]. ATLA-ALTERNATIVES TO LABORATORY ANIMALS, 2014, 42 (01): : 31 - 41
  • [8] Machine learning-based models to predict modes of toxic action of phenols to Tetrahymena pyriformis
    Castillo-Garit, J. A.
    Casanola-Martin, G. M.
    Barigye, S. J.
    Pham-The, H.
    Torrens, F.
    Torreblanca, A.
    [J]. SAR AND QSAR IN ENVIRONMENTAL RESEARCH, 2017, 28 (09) : 735 - 747
  • [9] QSAR Modeling: Where Have You Been? Where Are You Going To?
    Cherkasov, Artem
    Muratov, Eugene N.
    Fourches, Denis
    Varnek, Alexandre
    Baskin, Igor I.
    Cronin, Mark
    Dearden, John
    Gramatica, Paola
    Martin, Yvonne C.
    Todeschini, Roberto
    Consonni, Viviana
    Kuz'min, Victor E.
    Cramer, Richard
    Benigni, Romualdo
    Yang, Chihae
    Rathman, James
    Terfloth, Lothar
    Gasteiger, Johann
    Richard, Ann
    Tropsha, Alexander
    [J]. JOURNAL OF MEDICINAL CHEMISTRY, 2014, 57 (12) : 4977 - 5010
  • [10] COMPARATIVE MOLECULAR-FIELD ANALYSIS (COMFA) .1. EFFECT OF SHAPE ON BINDING OF STEROIDS TO CARRIER PROTEINS
    CRAMER, RD
    PATTERSON, DE
    BUNCE, JD
    [J]. JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 1988, 110 (18) : 5959 - 5967