Comparing structural fingerprints using a literature-based similarity benchmark

被引:160
|
作者
O'Boyle, Noel M. [1 ]
Sayle, Roger A. [1 ]
机构
[1] NextMove Software, Innovat Ctr, Cambridge Sci Pk,Milton Rd, Cambridge CB4 0EY, England
来源
JOURNAL OF CHEMINFORMATICS | 2016年 / 8卷
关键词
Similarity searching; Molecular fingerprints; Structural similarity; Similarity benchmark; PHENYLETHANOLAMINE N-METHYLTRANSFERASE; DEFINED ADRENERGIC AGENTS; VIRTUAL SCREENING METHODS; GASTRIC-ACID-SECRETION; NEIGHBORHOOD BEHAVIOR; MOLECULAR DESCRIPTOR; DATA SETS; IN-VITRO; INHIBITORS; ANALOGS;
D O I
10.1186/s13321-016-0148-0
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Background: The concept of molecular similarity is one of the central ideas in cheminformatics, despite the fact that it is ill-defined and rather difficult to assess objectively. Here we propose a practical definition of molecular similarity in the context of drug discovery: molecules A and B are similar if a medicinal chemist would be likely to synthesise and test them around the same time as part of the same medicinal chemistry program. The attraction of such a definition is that it matches one of the key uses of similarity measures in early-stage drug discovery. If we make the assumption that molecules in the same compound activity table in a medicinal chemistry paper were considered similar by the authors of the paper, we can create a dataset of similar molecules from the medicinal chemistry literature. Furthermore, molecules with decreasing levels of similarity to a reference can be found by either ordering molecules in an activity table by their activity, or by considering activity tables in different papers which have at least one molecule in common. Results: Using this procedure with activity data from ChEMBL, we have created two benchmark datasets for structural similarity that can be used to guide the development of improved measures. Compared to similar results from a virtual screen, these benchmarks are an order of magnitude more sensitive to differences between fingerprints both because of their size and because they avoid loss of statistical power due to the use of mean scores or ranks. We measure the performance of 28 different fingerprints on the benchmark sets and compare the results to those from the Riniker and Landrum (J Cheminf 5: 26, 2013. doi: 10.1186/1758-2946-5-26) ligand-based virtual screening benchmark. Conclusions: Extended-connectivity fingerprints of diameter 4 and 6 are among the best performing fingerprints when ranking diverse structures by similarity, as is the topological torsion fingerprint. However, when ranking very close analogues, the atom pair fingerprint outperforms the others tested. When ranking diverse structures or carrying out a virtual screen, we find that the performance of the ECFP fingerprints significantly improves if the bit-vector length is increased from 1024 to 16,384.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Comparing structural fingerprints using a literature-based similarity benchmark
    Noel M. O’Boyle
    Roger A. Sayle
    Journal of Cheminformatics, 8
  • [2] Similarity searching in ligand-based virtual screening using different fingerprints and different similarity coefficients
    Fouaz B.
    Hacene B.
    Hamza H.
    Saeed F.
    International Journal of Intelligent Systems Technologies and Applications, 2019, 18 (04) : 405 - 425
  • [3] Phytochemical information and pharmacological activities of Okra (Abelmoschus esculentus): A literature-based review
    Muhammad Torequl Islam
    PHYTOTHERAPY RESEARCH, 2019, 33 (01) : 72 - 80
  • [4] Improving structural similarity based virtual screening using background knowledge
    Girschick, Tobias
    Puchbauer, Lucia
    Kramer, Stefan
    JOURNAL OF CHEMINFORMATICS, 2013, 5
  • [5] Improving structural similarity based virtual screening using background knowledge
    Tobias Girschick
    Lucia Puchbauer
    Stefan Kramer
    Journal of Cheminformatics, 5
  • [6] Embedding-Based Entity Alignment Using Relation Structural Similarity
    Peng, Yanhui
    Zhang, Jing
    Zhou, Cangqi
    Xu, Jian
    11TH IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH (ICKG 2020), 2020, : 123 - 130
  • [7] Open-source platform to benchmark fingerprints for ligand-based virtual screening
    Riniker, Sereina
    Landrum, Gregory A.
    JOURNAL OF CHEMINFORMATICS, 2013, 5
  • [8] Feature selection using structural similarity
    Mitra, Sushmita
    Kundu, Partha Pratim
    Pedrycz, Witold
    INFORMATION SCIENCES, 2012, 198 : 48 - 61
  • [9] Identification of structural fingerprints for in vivo toxicity by using Monte Carlo based QSTR modeling of nitroaromatics
    Mondal, Dipayan
    Ghosh, Kalyan
    Baidya, Anurag T. K.
    Gantait, Anindita Mondal
    Gayen, Shovanlal
    TOXICOLOGY MECHANISMS AND METHODS, 2020, 30 (04) : 257 - 265
  • [10] PET Image Reconstruction Using Nonlocal Means Regularization Based on Structural Similarity
    Fang, Lei
    Yang, Lingli
    Zhang, Bo
    Li, Bingxuan
    Zhang, Xiangsong
    Xie, Qingguo
    Xiao, Peng
    2019 IEEE NUCLEAR SCIENCE SYMPOSIUM AND MEDICAL IMAGING CONFERENCE (NSS/MIC), 2019,