Ultrafast shape recognition for similarity search in molecular databases

被引:57
作者
Ballester, Pedro J. [1 ]
Richards, W. Graham [1 ]
机构
[1] Univ Oxford, Phys & Theoret Chem Lab, Oxford OX1 3QZ, England
来源
PROCEEDINGS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES | 2007年 / 463卷 / 2081期
关键词
molecular shape comparison; similarity search; pattern recognition; data explosion; virtual screening;
D O I
10.1098/rspa.2007.1823
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Molecular databases are routinely screened for compounds that most closely resemble a molecule of known biological activity to provide,novel drug leads. It is widely believed that three-dimensional molecular shape is the most discriminating pattern for biological activity as it is directly related to the steep repulsive part of the interaction potential between the drug-like molecule and its macromolecular target. However, efficient comparison of molecular shape is currently a challenge. Here, we show that a new approach based on moments of distance distributions is able to recognize molecular shape at least three orders of magnitude faster than current methodologies. Such an ultrafast method permits the identification of similarly shaped compounds within the largest molecular databases. In addition, the problematic requirement of aligning molecules for comparison is circumvented, as the proposed distributions are independent of molecular orientation. Our methodology could be also adapted to tackle similar hard problems in other fields, such as designing content-based Internet search engines for three-dimensional geometrical objects or performing fast similarity comparisons between proteins. From a broader perspective, we anticipate that ultrafast pattern recognition will soon become not only useful, but also essential to address the data explosion currently experienced in most scientific disciplines.
引用
收藏
页码:1307 / 1321
页数:15
相关论文
共 23 条
[1]   Evaluation of structural similarity based on reduced dimensionality representations of protein structure [J].
Albrecht, B ;
Grant, GH ;
Richards, WG .
PROTEIN ENGINEERING DESIGN & SELECTION, 2004, 17 (05) :425-432
[2]  
Baringhaus Karl-Heinz, 2004, Drug Discov Today Technol, V1, P197, DOI 10.1016/j.ddtec.2004.11.001
[3]   Molecular similarity: a key technique in molecular informatics [J].
Bender, A ;
Glen, RC .
ORGANIC & BIOMOLECULAR CHEMISTRY, 2004, 2 (22) :3204-3218
[4]  
BERNIS GW, 1992, J COMPUT AID MOL DES, V6, P607, DOI DOI 10.1007/BF00126218
[5]  
Bohm Hans-Joachim, 2004, Drug Discov Today Technol, V1, P217, DOI 10.1016/j.ddtec.2004.10.009
[6]   Discovery informatics: its evolving role in drug discovery [J].
Claus, BL ;
Underwood, DJ .
DRUG DISCOVERY TODAY, 2002, 7 (18) :957-966
[7]   Shape-based retrieval and analysis of 3D models [J].
Funkhouser, T ;
Kazhdan, M ;
Min, P ;
Shilane, P .
COMMUNICATIONS OF THE ACM, 2005, 48 (06) :58-64
[8]   Explicit calculation of 3D molecular similarity [J].
Good, AC ;
Richards, WG .
PERSPECTIVES IN DRUG DISCOVERY AND DESIGN, 1998, 9-11 :321-338
[9]   NEW MOLECULAR SHAPE DESCRIPTORS - APPLICATION IN DATABASE SCREENING [J].
GOOD, AC ;
EWING, TJA ;
GSCHWEND, DA ;
KUNTZ, ID .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 1995, 9 (01) :1-12
[10]   Three-dimensional shape-based searching of conformationally flexible compounds [J].
Hahn, M .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (01) :80-86