Predicting molecular activity on nuclear receptors by multitask neural networks

被引:16
作者
Valsecchi, Cecile [1 ]
Collarile, Magda [1 ]
Grisoni, Francesca [2 ]
Todeschini, Roberto [1 ]
Ballabio, Davide [1 ]
Consonni, Viviana [1 ]
机构
[1] Univ Milano Bicocca, Dept Earth & Environm Sci, Milano Chemometr & QSAR Res Grp, Pza Sci 1, I-20126 Milan, Italy
[2] Swiss Fed Inst Technol, Dept Chem & Appl Biosci, Vladimir Prelog Weg 4, CH-8049 Zurich, Switzerland
关键词
classification; deep learning; genetic algorithms; multitask; nuclear receptors; QSAR; MACHINE LEARNING-METHODS; GENETIC ALGORITHMS; PLS-REGRESSION; KOHONEN; SELECTION;
D O I
10.1002/cem.3325
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The interest in multitask and deep learning strategies has been increasing in the last few years, in application to large and complex dataset for quantitative structure-activity relationship (QSAR) analysis. Multitask approaches allow the simultaneous prediction of molecular properties that are related, through information sharing, whereas deep learning strategies increase the potential of capturing nonlinear relationships. In this work, we compare the binary classification capability of multitask deep and shallow neural networks to single-task strategies used as benchmark (i.e., as k-nearest neighbours, N-nearest neighbours, random forest and Naive Bayes), as well as multitask supervised self-organizing maps. Comparison was carried out with an extended QSAR dataset containing annotations of molecular binding, agonism and antagonism activity on 11 nuclear receptors, for a total of 14,963 molecules, divided into training and test sets and labelled for their bioactivity on at least one of 30 binary tasks. Additional 304 chemicals were used as external evaluation set to further validate models. Although no approach systematically overperformed the others, task-specific differences were found, suggesting the benefit of multitask learning for tasks that are less represented. On average, some of the single-task approaches and multitask deep learning strategies had similar performances. However, the latter can have advantages, such as a simpler management of predictions and applicability domain assessment for future samples. On the other hand, the parameter tuning required by neural networks are generally time expensive suggesting that the modelling strategy should be evaluated case by case.
引用
收藏
页数:18
相关论文
共 69 条
  • [1] Agostinelli F., 2014, ARXIV
  • [2] [Anonymous], 2014, NIPS WORKSH DEEP LEA
  • [3] [Anonymous], 2014, arXiv, DOI DOI 10.48550/ARXIV.1406.1231
  • [4] [Anonymous], 2016, PYTH LANG REF
  • [5] Multivariate comparison of classification performance measures
    Ballabio, Davide
    Grisoni, Francesca
    Todeschini, Roberto
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2018, 174 : 33 - 44
  • [6] A MATLAB toolbox for Principal Component Analysis and unsupervised exploration of data structure
    Ballabio, Davide
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2015, 149 : 1 - 9
  • [7] Classification tools in chemistry. Part 1: linear models. PLS-DA
    Ballabio, Davide
    Consonni, Viviana
    [J]. ANALYTICAL METHODS, 2013, 5 (16) : 3790 - 3798
  • [8] Genetic Algorithms for architecture optimisation of Counter-Propagation Artificial Neural Networks
    Ballabio, Davide
    Vasighi, Mandi
    Consonni, Viviana
    Kompany-Zareh, Mohsen
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2011, 105 (01) : 56 - 64
  • [9] A ComDarison of Selection Schemes Used in Evolutionary Algorithms
    Blickle, Tobias
    Thiele, Lothar
    [J]. EVOLUTIONARY COMPUTATION, 1996, 4 (04) : 361 - 394
  • [10] Random forests: Finding quasars
    Breiman, L
    Last, M
    Rice, J
    [J]. STATISTICAL CHALLENGES IN ASTRONOMY, 2003, : 243 - 254