Scaling Machine Learning for Target Prediction in Drug Discovery using Apache Spark

被引:5
|
作者
Harnie, Dries [1 ,3 ]
Vapirev, Alexander E. [2 ,3 ]
Wegner, Jorg Kurt [2 ]
Gedich, Andrey [6 ]
Steijaert, Marvin [7 ]
Wuyts, Roel [3 ,4 ,5 ]
De Meuter, Wolfgang [1 ]
机构
[1] Vrije Univ Brussel, Software Languages Lab, Pl Laan 2, B-1050 Brussels, Belgium
[2] Janssen Pharmaceut, B-2340 Beerse, Belgium
[3] ExaSci Life Lab, B-3001 Leuven, Belgium
[4] IMEC, B-3001 Leuven, Belgium
[5] Katholieke Univ Leuven, DistriNet, B-3001 Leuven, Belgium
[6] ARCADIA Inc, Rostra Business Ctr, St Petersburg 195112, Russia
[7] OpenAnalytics, B-2220 Heist Op Den Berg, Belgium
来源
2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING | 2015年
关键词
IDENTIFICATION; TOOL;
D O I
10.1109/CCGrid.2015.50
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In the context of drug discovery, a key problem is the identification of candidate molecules that affect proteins associated with diseases. Inside Janssen Pharmaceutica, the Chemogenomics project aims to derive new candidates from existing experiments through a set of machine learning predictor programs, written in single-node C++. These programs take a long time to run and are inherently parallel, but do not use multiple nodes. We show how we reimplemented the pipeline using Apache Spark, which enabled us to lift the existing programs to a multi-node cluster without making changes to the predictors. We have benchmarked our Spark pipeline against the original, which shows almost linear speedup up to 8 nodes. In addition, our pipeline generates fewer intermediate files while allowing easier checkpointing and monitoring.
引用
收藏
页码:871 / 879
页数:9
相关论文
共 50 条
  • [31] Similarity-based machine learning methods for predicting drug-target interactions: a brief review
    Ding, Hao
    Takigawa, Ichigaku
    Mamitsuka, Hiroshi
    Zhu, Shanfeng
    BRIEFINGS IN BIOINFORMATICS, 2014, 15 (05) : 734 - 747
  • [32] MPSM-DTI: prediction of drug-target interaction via machine learning based on the chemical structure and protein sequence
    Peng, Yayuan
    Wang, Jiye
    Wu, Zengrui
    Zheng, Lulu
    Wang, Biting
    Liu, Guixia
    Li, Weihua
    Tang, Yun
    DIGITAL DISCOVERY, 2022, 1 (02): : 115 - 126
  • [33] Recent advances in drug repurposing using machine learning
    Urbina, Fabio
    Puhl, Ana C.
    Ekins, Sean
    CURRENT OPINION IN CHEMICAL BIOLOGY, 2021, 65 : 74 - 84
  • [34] Comprehensive Survey of Recent Drug Discovery Using Deep Learning
    Kim, Jintae
    Park, Sera
    Min, Dongbo
    Kim, Wankyu
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2021, 22 (18)
  • [35] Binding affinity prediction for binary drug-target interactions using semi-supervised transfer learning
    Tanoori, Betsabeh
    Zolghadri Jahromi, Mansoor
    Mansoori, Eghbal G.
    JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2021, 35 (08) : 883 - 900
  • [36] DDR: efficient computational method to predict drug-target interactions using graph mining and machine learning approaches
    Olayan, Rawan S.
    Ashoor, Haitham
    Bajic, Vladimir B.
    BIOINFORMATICS, 2018, 34 (07) : 1164 - 1173
  • [37] Machine learning prediction of oncology drug targets based on protein and network properties
    Dezso, Zoltan
    Ceccarelli, Michele
    BMC BIOINFORMATICS, 2020, 21 (01)
  • [38] The Development of Target-Specific Machine Learning Models as Scoring Functions for Docking-Based Target Prediction
    Nogueira, Mauro S.
    Koch, Oliver
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2019, 59 (03) : 1238 - 1252
  • [39] A Survey on Plant Disease Prediction using Machine Learning and Deep Learning Techniques
    Gokulnath, B., V
    Devi, Usha G.
    INTELIGENCIA ARTIFICIAL-IBEROAMERICAL JOURNAL OF ARTIFICIAL INTELLIGENCE, 2020, 23 (65): : 136 - 154
  • [40] PREDICTION OF REGULATORY sRNAs IN PROKARYOTES USING MACHINE LEARNING TOOLS
    Abu-halaweh, Nael
    Sabnis, Amit
    Harrison, Robert
    BIOINFORMATICS 2011, 2011, : 75 - 81