Hadoop-MCC: Efficient Multiple Compound Comparison Algorithm Using Hadoop

被引:1
作者
Hua, Guan-Jie [1 ]
Hung, Che-Lun [2 ,3 ,4 ]
Tang, Chuan Yi [1 ,5 ]
机构
[1] Natl Tsing Hua Univ, Dept Comp Sci, Hsinchu, Taiwan
[2] Providence Univ, Dept Comp Sci & Commun Engn, Taichung, Taiwan
[3] Providence Univ, Big Data Res Ctr, Taichung, Taiwan
[4] Med Univ, Canc Hosp & Inst Guangzhou, Guangzhou, Guangdong, Peoples R China
[5] Providence Univ, Dept Comp Sci & Informat Engn, Taichung, Taiwan
关键词
LINGO; hadoop; high performance computing; GPU; big data; compound comparision; DRUG DESIGN; MOLECULES;
D O I
10.2174/1386207321666180102120641
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Aim and Objective: In the past decade, the drug design technologies have been improved enormously. The computer-aided drug design (CADD) has played an important role in analysis and prediction in drug development, which makes the procedure more economical and efficient. However, computation with big data, such as ZINC containing more than 60 million compounds data and GDB-13 with more than 930 million small molecules, is a noticeable issue of time-consuming problem. Therefore, we propose a novel heterogeneous high performance computing method, named as Hadoop-MCC, integrating Hadoop and GPU, to copy with big chemical structure data efficiently. Materials and Methods: Hadoop-MCC gains the high availability and fault tolerance from Hadoop, as Hadoop is used to scatter input data to GPU devices and gather the results from GPU devices. Hadoop framework adopts mapper/reducer computation model. In the proposed method, mappers response for fetching SMILES data segments and perform LINGO method on GPU, then reducers collect all comparison results produced by mappers. Due to the high availability of Hadoop, all of LINGO computational jobs on mappers can be completed, even if some of the mappers encounter problems. Results: A comparison of LINGO is performed on each the GPU device in parallel. According to the experimental results, the proposed method on multiple GPU devices can achieve better computational performance than the CUDA-MCC on a single GPU device. Conclusion: Hadoop-MCC is able to achieve scalability, high availability, and fault tolerance granted by Hadoop, and high performance as well by integrating computational power of both of Hadoop and GPU. It has been shown that using the heterogeneous architecture as Hadoop-MCC effectively can enhance better computational performance than on a single GPU device.
引用
收藏
页码:84 / 92
页数:9
相关论文
共 22 条
[1]  
[Anonymous], BIOMED RES INT
[2]   Development of Quantum Chemical Method to Calculate Half Maximal Inhibitory Concentration (IC50) [J].
Bag, Arijit ;
Ghorai, Pradip Kr. .
MOLECULAR INFORMATICS, 2016, 35 (05) :199-206
[3]   970 Million Druglike Small Molecules for Virtual Screening in the Chemical Universe Database GDB-13 [J].
Blum, Lorenz C. ;
Reymond, Jean-Louis .
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2009, 131 (25) :8732-+
[4]  
CHENG Y, 1973, BIOCHEM PHARMACOL, V22, P3099
[5]  
Dimitrov M., 2009, P 2 WORKSH GEN PURP, P94
[6]   Hepatitis B Virus X Upregulates HuR Protein Level to Stabilize HER2 Expression in Hepatocellular Carcinoma Cells [J].
Hung, Chao-Ming ;
Huang, Wei-Chien ;
Pan, Hsiao-Lin ;
Chien, Pei-Hsuan ;
Lin, Chih-Wen ;
Chen, Lei-Chin ;
Chien, Yu-Fong ;
Lin, Ching-Chiao ;
Leow, Kar-Hee ;
Chen, Wen-Shu ;
Chen, Jhen-Yu ;
Ho, Chien-Yi ;
Hou, Pao-Sheng ;
Chen, Yun-Ju .
BIOMED RESEARCH INTERNATIONAL, 2014, 2014
[7]  
Kubinyi H., 2002, 3D QSAR in Drug Design Volume 2, Ligand-Protein Interactions and Molecular Similarity
[8]   Computational methods for biomolecular docking [J].
Lengauer, T ;
Rarey, M .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1996, 6 (03) :402-406
[9]  
Lin C. C., 2015, Int. J. of Distrib. Sens. Netw., V11, P1
[10]   CUDA-BLASTP: Accelerating BLASTP on CUDA-Enabled Graphics Hardware [J].
Liu, Weiguo ;
Schmidt, Bertil ;
Mueller-Wittig, Wolfgang .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2011, 8 (06) :1678-1684