Data-Driven High-Throughput Prediction of the 3-D Structure of Small Molecules: Review and Progress

被引:24
作者
Andronico, Alessio [1 ]
Randall, Arlo [1 ]
Benz, Ryan W. [1 ]
Baldi, Pierre [1 ,2 ]
机构
[1] Univ Calif Irvine, Sch Informat & Comp Sci, Inst Genom & Bioinformat, Irvine, CA 92697 USA
[2] Univ Calif Irvine, Dept Biol Chem, Irvine, CA 92697 USA
关键词
INFORMATION-SYSTEM; SMILES; CHEMOINFORMATICS; GENERATION; DISCOVERY; DYNAMICS; CHEMDB; ENERGY; FIELD; AMBER;
D O I
10.1021/ci100223t
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Accurate prediction of the 3-D structure of small molecules is essential in order to understand their physical, chemical, and biological properties, including how they interact with other molecules. Here, we survey the field of high-throughput methods for 3-D structure prediction and set up new target specifications for the next generation of methods. We then introduce COSMOS, a novel data-driven prediction method that utilizes libraries of fragment and torsion angle parameters. We illustrate COSMOS using parameters extracted from the Cambridge Structural Database (CSD) by analyzing their distribution and then evaluating the system's performance in terms of speed, coverage, and accuracy. Results show that COSMOS represents a significant improvement when compared to state-of-the-art prediction methods, particularly in terms of coverage of complex molecular structures, including metal-organics. COSMOS can predict structures for 96.4% of the molecules in the CSD (99.6% organic, 94.6% metal-organic), whereas the widely used commercial method CORINA predicts structures for 68.5% (98.5% organic, 51.6% metal-organic). On the common subset of molecules predicted by both methods, COSMOS makes predictions with an average speed per molecule of 0.15 s (0.10 s organic, 0.21 s metal-organic) and an average rmsd of 1.57 angstrom (1.26 angstrom organic, 1.90 angstrom metal-organic), and CORINA makes predictions with an average speed per molecule of 0.13s (0.18s organic, 0.08s metal-organic) and an average rmsd of 1.60 angstrom (1.13 angstrom organic, 2.11 angstrom metal-organic). COSMOS is available through the ChemDB chemoinformatics Web portal at http://cdb.ics.uci.edu/.
引用
收藏
页码:760 / 776
页数:17
相关论文
共 54 条
  • [1] A self-organizing principle for learning nonlinear manifolds
    Agrafiotis, DK
    Xu, HF
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (25) : 15869 - 15872
  • [2] Applications of the Cambridge Structural Database in organic chemistry and crystal chemistry
    Allen, FH
    Motherwell, WDS
    [J]. ACTA CRYSTALLOGRAPHICA SECTION B-STRUCTURAL SCIENCE CRYSTAL ENGINEERING AND MATERIALS, 2002, 58 : 407 - 422
  • [3] The Cambridge Structural Database: a quarter of a million crystal structures and rising
    Allen, FH
    [J]. ACTA CRYSTALLOGRAPHICA SECTION B-STRUCTURAL SCIENCE, 2002, 58 (3 PART 1): : 380 - 388
  • [4] Lossless compression of chemical fingerprints using integer entropy codes improves storage and retrieval
    Baldi, Pierre
    Benz, Ryan W.
    Hirschberg, Daniel S.
    Swamidass, S. Joshua
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2007, 47 (06) : 2098 - 2109
  • [5] Discovery of power-laws in chemical space
    Benz, Ryan W.
    Swamidass, S. Joshua
    Baldi, Pierre
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2008, 48 (06) : 1138 - 1151
  • [6] In Vitro Anticancer Activity and Biologically Relevant Metabolization of Organometallic Ruthenium Complexes with Carbohydrate-Based Ligands
    Berger, Isabella
    Hanif, Muhammad
    Nazarov, Alexey A.
    Hartinger, Christian G.
    John, Roland O.
    Kuznetsov, Maxim L.
    Groessl, Michael
    Schmitt, Frederic
    Zava, Olivier
    Biba, Florian
    Arion, Vladimir B.
    Galanski, Markus
    Jakupec, Michael A.
    Juillerat-Jeanneret, Lucienne
    Dyson, Paul J.
    Keppler, Bernhard K.
    [J]. CHEMISTRY-A EUROPEAN JOURNAL, 2008, 14 (29) : 9046 - 9057
  • [7] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [8] VAN DER WAALS VOLUMES + RADII
    BONDI, A
    [J]. JOURNAL OF PHYSICAL CHEMISTRY, 1964, 68 (03) : 441 - +
  • [9] CHARMM - A PROGRAM FOR MACROMOLECULAR ENERGY, MINIMIZATION, AND DYNAMICS CALCULATIONS
    BROOKS, BR
    BRUCCOLERI, RE
    OLAFSON, BD
    STATES, DJ
    SWAMINATHAN, S
    KARPLUS, M
    [J]. JOURNAL OF COMPUTATIONAL CHEMISTRY, 1983, 4 (02) : 187 - 217
  • [10] The Amber biomolecular simulation programs
    Case, DA
    Cheatham, TE
    Darden, T
    Gohlke, H
    Luo, R
    Merz, KM
    Onufriev, A
    Simmerling, C
    Wang, B
    Woods, RJ
    [J]. JOURNAL OF COMPUTATIONAL CHEMISTRY, 2005, 26 (16) : 1668 - 1688