Scaling Machine Learning for Target Prediction in Drug Discovery using Apache Spark

被引:5
|
作者
Harnie, Dries [1 ,3 ]
Vapirev, Alexander E. [2 ,3 ]
Wegner, Jorg Kurt [2 ]
Gedich, Andrey [6 ]
Steijaert, Marvin [7 ]
Wuyts, Roel [3 ,4 ,5 ]
De Meuter, Wolfgang [1 ]
机构
[1] Vrije Univ Brussel, Software Languages Lab, Pl Laan 2, B-1050 Brussels, Belgium
[2] Janssen Pharmaceut, B-2340 Beerse, Belgium
[3] ExaSci Life Lab, B-3001 Leuven, Belgium
[4] IMEC, B-3001 Leuven, Belgium
[5] Katholieke Univ Leuven, DistriNet, B-3001 Leuven, Belgium
[6] ARCADIA Inc, Rostra Business Ctr, St Petersburg 195112, Russia
[7] OpenAnalytics, B-2220 Heist Op Den Berg, Belgium
来源
2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING | 2015年
关键词
IDENTIFICATION; TOOL;
D O I
10.1109/CCGrid.2015.50
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In the context of drug discovery, a key problem is the identification of candidate molecules that affect proteins associated with diseases. Inside Janssen Pharmaceutica, the Chemogenomics project aims to derive new candidates from existing experiments through a set of machine learning predictor programs, written in single-node C++. These programs take a long time to run and are inherently parallel, but do not use multiple nodes. We show how we reimplemented the pipeline using Apache Spark, which enabled us to lift the existing programs to a multi-node cluster without making changes to the predictors. We have benchmarked our Spark pipeline against the original, which shows almost linear speedup up to 8 nodes. In addition, our pipeline generates fewer intermediate files while allowing easier checkpointing and monitoring.
引用
收藏
页码:871 / 879
页数:9
相关论文
共 50 条
  • [41] Prediction model for IcRNA aubcellular localization using machine learning
    Kalim, Zareen
    Arshad, Amna
    4TH INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING (IC)2, 2021, : 913 - 922
  • [42] Classification and prediction of diabetes disease using machine learning paradigm
    Maniruzzaman, Md.
    Rahman, Md. Jahanur
    Ahammed, Benojir
    Abedin, Md. Menhazul
    HEALTH INFORMATION SCIENCE AND SYSTEMS, 2020, 8 (01)
  • [43] Osteoporosis Risk Prediction Using Machine Learning and Conventional Methods
    Kim, Sung Kean
    Yoo, Tae Keun
    Oh, Ein
    Kim, Deok Won
    2013 35TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2013, : 188 - 191
  • [44] Accurate prediction of Snare Protein Sequence using Machine Learning
    Talpur, Dani Bux
    Shaikh, Salahuddin
    Khowaja, Ashfaque
    Adnan, Saifullah
    Ghulam, Ali
    BIOSCIENCE RESEARCH, 2022, 19 (03): : 1414 - 1422
  • [45] Prediction of Phage Virion Proteins Using Machine Learning Methods
    Barman, Ranjan Kumar
    Chakrabarti, Alok Kumar
    Dutta, Shanta
    MOLECULES, 2023, 28 (05):
  • [46] Prediction of intracellular exposure bridges the gap between target- and cell-based drug discovery
    Mateus, Andre
    Gordon, Laurie J.
    Wayne, Gareth J.
    Almqvist, Helena
    Axelsson, Hanna
    Seashore-Ludlow, Brinton
    Treyer, Andrea
    Matsson, Par
    Lundback, Thomas
    West, Andy
    Hann, Michael M.
    Artursson, Per
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2017, 114 (30) : E6231 - E6239
  • [47] Application of artificial intelligence and machine learning for COVID-19 drug discovery and vaccine design
    Lv, Hao
    Shi, Lei
    Berkenpas, Joshua William
    Dao, Fu-Ying
    Zulfiqar, Hasan
    Ding, Hui
    Zhang, Yang
    Yang, Liming
    Cao, Renzhi
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (06)
  • [48] Polypharmacology Browser PPB2: Target Prediction Combining Nearest Neighbors with Machine Learning
    Awale, Mahendra
    Reymond, Jean-Louis
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2019, 59 (01) : 10 - 17
  • [49] Comparing Deep and Machine Learning Approaches in Bioinformatics: A miRNA-Target Prediction Case Study
    Giansanti, Valentina
    Castelli, Mauro
    Beretta, Stefano
    Merelli, Ivan
    COMPUTATIONAL SCIENCE - ICCS 2019, PT III, 2019, 11538 : 31 - 44
  • [50] Supervised prediction of drug-target interactions using bipartite local models
    Bleakley, Kevin
    Yamanishi, Yoshihiro
    BIOINFORMATICS, 2009, 25 (18) : 2397 - 2403