Scaling Machine Learning for Target Prediction in Drug Discovery using Apache Spark

被引:5
|
作者
Harnie, Dries [1 ,3 ]
Vapirev, Alexander E. [2 ,3 ]
Wegner, Jorg Kurt [2 ]
Gedich, Andrey [6 ]
Steijaert, Marvin [7 ]
Wuyts, Roel [3 ,4 ,5 ]
De Meuter, Wolfgang [1 ]
机构
[1] Vrije Univ Brussel, Software Languages Lab, Pl Laan 2, B-1050 Brussels, Belgium
[2] Janssen Pharmaceut, B-2340 Beerse, Belgium
[3] ExaSci Life Lab, B-3001 Leuven, Belgium
[4] IMEC, B-3001 Leuven, Belgium
[5] Katholieke Univ Leuven, DistriNet, B-3001 Leuven, Belgium
[6] ARCADIA Inc, Rostra Business Ctr, St Petersburg 195112, Russia
[7] OpenAnalytics, B-2220 Heist Op Den Berg, Belgium
来源
2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING | 2015年
关键词
IDENTIFICATION; TOOL;
D O I
10.1109/CCGrid.2015.50
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In the context of drug discovery, a key problem is the identification of candidate molecules that affect proteins associated with diseases. Inside Janssen Pharmaceutica, the Chemogenomics project aims to derive new candidates from existing experiments through a set of machine learning predictor programs, written in single-node C++. These programs take a long time to run and are inherently parallel, but do not use multiple nodes. We show how we reimplemented the pipeline using Apache Spark, which enabled us to lift the existing programs to a multi-node cluster without making changes to the predictors. We have benchmarked our Spark pipeline against the original, which shows almost linear speedup up to 8 nodes. In addition, our pipeline generates fewer intermediate files while allowing easier checkpointing and monitoring.
引用
收藏
页码:871 / 879
页数:9
相关论文
共 50 条
  • [1] Scaling machine learning for target prediction in drug discovery using Apache Spark
    Harnie, Dries
    Saey, Mathijs
    Vapirev, Alexander E.
    Wegner, Jorg Kurt
    Gedich, Andrey
    Steijaert, Marvin
    Ceulemans, Hugo
    Wuyts, Roel
    De Meuter, Wolfgang
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2017, 67 : 409 - 417
  • [2] Prediction of Drug Target Sensitivity in Cancer Cell Lines Using Apache Spark
    Hussain, Shahid
    Ferzund, Javed
    Raza-Ul-Haq
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2019, 26 (08) : 882 - 889
  • [3] Applications of Machine Learning in Drug Target Discovery
    Gao, Dongrui
    Chen, Qingyuan
    Zeng, Yuanqi
    Jiang, Meng
    Zhang, Yongqing
    CURRENT DRUG METABOLISM, 2020, 21 (10) : 790 - 803
  • [4] Machine learning for target discovery in drug development
    Rodrigues, Tiago
    Bernardes, Goncalo J. L.
    CURRENT OPINION IN CHEMICAL BIOLOGY, 2020, 56 : 16 - 22
  • [5] Applications of Machine Learning in miRNA Discovery and Target Prediction
    Parveen, Alisha
    Mustafa, Syed H.
    Yadav, Pankaj
    Kumar, Abhishek
    CURRENT GENOMICS, 2019, 20 (08) : 537 - 544
  • [6] Application of Machine Learning for Drug-Target Interaction Prediction
    Xu, Lei
    Ru, Xiaoqing
    Song, Rong
    FRONTIERS IN GENETICS, 2021, 12
  • [7] Similarity-Based Methods and Machine Learning Approaches for Target Prediction in Early Drug Discovery: Performance and Scope
    Mathai, Neann
    Kirchmair, Johannes
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2020, 21 (10)
  • [8] Survey of Machine Learning Techniques in Drug Discovery
    Stephenson, Natalie
    Shane, Emily
    Chase, Jessica
    Rowland, Jason
    Ries, David
    Justice, Nicola
    Zhang, Jie
    Chan, Leong
    Cao, Renzhi
    CURRENT DRUG METABOLISM, 2019, 20 (03) : 185 - 193
  • [9] Applications of machine learning in drug discovery and development
    Vamathevan, Jessica
    Clark, Dominic
    Czodrowski, Paul
    Dunham, Ian
    Ferran, Edgardo
    Lee, George
    Li, Bin
    Madabhushi, Anant
    Shah, Parantu
    Spitzer, Michaela
    Zhao, Shanrong
    NATURE REVIEWS DRUG DISCOVERY, 2019, 18 (06) : 463 - 477
  • [10] Recent Advances in the Machine Learning-based Drug-target Interaction Prediction
    Zhang, Wen
    Lin, Weiran
    Zhang, Ding
    Wang, Siman
    Shi, Jingwen
    Niu, Yanqing
    CURRENT DRUG METABOLISM, 2019, 20 (03) : 194 - 202