Accelerating Chemical Database Searching Using Graphics Processing Units

被引:12
作者
Liu, Pu [1 ]
Agrafiotis, Dimitris K. [1 ]
Rassokhin, Dmitrii N. [1 ]
Yang, Eric [1 ]
机构
[1] Johnson & Johnson Pharmaceut Res & Dev LLC, Spring House, PA 19477 USA
关键词
SMALL MOLECULES; FINGERPRINTS; DESCRIPTORS;
D O I
10.1021/ci200164g
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
The utility of chemoinformatics systems depends on the accurate computer representation and efficient manipulation of chemical compounds. In such systems, a small molecule is often digitized as a large fingerprint vector, where each element indicates the presence/absence or the number of occurrences of a particular structural feature, Since in theory the number of unique features can be exceedingly large, these fingerprint vectors are usually folded into much shorter ones using hashing and modulo operations, allowing fast "in-memory" manipulation and comparison of molecules. There is increasing evidence that lossless fingerprints can substantially improve retrieval performance in chemical database searching (substructure or similarity), which have led to the development of several lossless fingerprint compression algorithms. However, any gains in storage and retrieval afforded by compression need to be weighed against the extra computational burden required for decompression before these fingerprints can be compared. Here we demonstrate that graphics processing units (GPU) can greatly alleviate this problem, enabling the practical application of lossless fingerprints on large databases. More specifically, we show that, with the help of a similar to$500 ordinary video card, the entire PubChem database of similar to 32 million compounds can be searched in similar to 0.2-2s on average, which is 2 orders of magnitude faster than a conventional CPU. If multiple query patterns are processed in batch, the speedup is even more dramatic (less than 0.02-0.2 s/query for 1000 queries). In the present study, we use the Elias gamma compression algorithm, which results in a compression ratio as high as 0.097.
引用
收藏
页码:1807 / 1816
页数:10
相关论文
共 34 条
[1]   Advanced biological and chemical discovery (ABCD): Centralizing discovery knowledge in an inherently decentralized world [J].
Agrafiotis, Dimitris K. ;
Alex, Simson ;
Dai, Heng ;
Derkinderen, An ;
Farnum, Michael ;
Gates, Peter ;
Izrailev, Sergei ;
Jaeger, Edward P. ;
Konstant, Paul ;
Leung, Albert ;
Lobanov, Victor S. ;
Marichal, Patrick ;
Martin, Douglas ;
Rassokhin, Dmitrii N. ;
Shemanarev, Maxim ;
Skalkin, Andrew ;
Stong, John ;
Tabruyn, Tom ;
Vermeiren, Marleen ;
Wan, Jackson ;
Xu, Xiang Yang ;
Yao, Xiang .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2007, 47 (06) :1999-2014
[2]   Combinatorial informatics in the post-genomics era [J].
Agrafiotis, DK ;
Lobanov, VS ;
Salemme, FR .
NATURE REVIEWS DRUG DISCOVERY, 2002, 1 (05) :337-346
[3]  
[Anonymous], 2008, P 25 INT C MACHINE L, DOI DOI 10.1145/1390156.1390170
[4]  
[Anonymous], CUHMM CUDA IMPLEMENT
[5]  
[Anonymous], P RIVF 2008
[6]  
[Anonymous], PERFORMANCE COMPARIS
[7]   Lossless compression of chemical fingerprints using integer entropy codes improves storage and retrieval [J].
Baldi, Pierre ;
Benz, Ryan W. ;
Hirschberg, Daniel S. ;
Swamidass, S. Joshua .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2007, 47 (06) :2098-2109
[8]   Chemical fragment generation and clustering software [J].
Barnard, JM ;
Downs, GM .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (01) :141-142
[9]   How Similar Are Similarity Searching Methods? A Principal Component Analysis of Molecular Descriptor Space [J].
Bender, Andreas ;
Jenkins, Jeremy L. ;
Scheiber, Josef ;
Sukuru, Sai Chelan K. ;
Glick, Meir ;
Davies, John W. .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2009, 49 (01) :108-119
[10]  
Bolton EE, 2010, ANN REP COMP CHEM, V4, P217, DOI 10.1016/S1574-1400(08)00012-1