Deep Semi-supervised Learning for Virtual Screening Based on Big Data Analytics

被引:0
作者
Bahi, Meriem [1 ]
Batouche, Mohamed
机构
[1] Univ Constantine 2 Abdelhamid Mehri, Fac NTIC, Comp Sci Dept, Biotechnol Res Ctr CRBt, Constantine, Algeria
来源
BIG DATA, CLOUD AND APPLICATIONS, BDCA 2018 | 2018年 / 872卷
关键词
Drug discovery; Virtual screening; Deep learning; Stacked autoencoders; Big Data; H2O; Spark; MACHINE; DRUGS;
D O I
10.1007/978-3-319-96292-4_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, scientists and researchers, are facing the problem of massive data processing, which consumes relatively too much time and cost. That is why researchers have turned to Deep Learning (DL) techniques based on Big Data Analytics. On the other hand, the ever-increasing size of unlabelled data combined with the difficulty of obtaining class labels has made semi-supervised learning an interesting alternative of significant practical importance in modern data analysis. In the same context, drug discovery has reached a state and complexity that we can no longer avoid using Deep Semi-Supervised Learning and Big Data Processing Systems. Virtual Screening (VS) is a computationally intensive process which plays a major role in the early phase of drug discovery process. The VS has to be made as fast as possible to efficiently dock the ligands from huge databases to a selected protein receptor. For these reasons, we propose a deep semi-supervised learning-based algorithmic framework named DeepSSL-VS for pre-filtering the huge set of ligands to effectively do virtual screening for the breast cancer protein receptor. The latter combines stacked autoencoders and deep neural network and is implemented using the Spark-H2O platform. The proposed technique has been compared to twenty-four different machine learning algorithms applied all on the same reference datasets, and preliminary performance assessment results have shown that our approach outperforms these techniques with an overall accuracy performance more than 99%.
引用
收藏
页码:173 / 184
页数:12
相关论文
共 23 条
[1]   Perspective: Materials informatics and big data: Realization of the "fourth paradigm" of science in materials science [J].
Agrawal, Ankit ;
Choudhary, Alok .
APL MATERIALS, 2016, 4 (05)
[2]   Deep Learning Applications for Predicting Pharmacological Properties of Drugs and Drug Repurposing Using Transcriptomic Data [J].
Aliper, Alexander ;
Plis, Sergey ;
Artemov, Artem ;
Ulloa, Alvaro ;
Mamoshina, Polina ;
Zhavoronkov, Alex .
MOLECULAR PHARMACEUTICS, 2016, 13 (07) :2524-2530
[3]  
[Anonymous], 2016, SCHEDAE INFORM
[4]  
[Anonymous], 2016, Deep learning with H2O
[5]  
[Anonymous], 2014, ARXIV PREPRINT ARXIV
[6]  
[Anonymous], 2016, P 2016 6 INT WORKSH
[7]   Comparison of support vector machine and artificial neural network systems for drug/nondrug classification [J].
Byvatov, E ;
Fechner, U ;
Sadowski, J ;
Schneider, G .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2003, 43 (06) :1882-1889
[8]  
Cook D, 2016, Practical machine learning with H2O: powerful
[9]  
Erhan D, 2010, J MACH LEARN RES, V11, P625
[10]   DrugLogit: Logistic Discrimination between Drugs and Nondrugs Including Disease-Specificity by Assigning Probabilities Based on Molecular Properties [J].
Garcia-Sosa, Alfonso T. ;
Oja, Mare ;
Hetenyi, Csaba ;
Maran, Uko .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2012, 52 (08) :2165-2180