Comparing structural fingerprints using a literature-based similarity benchmark

被引:160
|
作者
O'Boyle, Noel M. [1 ]
Sayle, Roger A. [1 ]
机构
[1] NextMove Software, Innovat Ctr, Cambridge Sci Pk,Milton Rd, Cambridge CB4 0EY, England
来源
JOURNAL OF CHEMINFORMATICS | 2016年 / 8卷
关键词
Similarity searching; Molecular fingerprints; Structural similarity; Similarity benchmark; PHENYLETHANOLAMINE N-METHYLTRANSFERASE; DEFINED ADRENERGIC AGENTS; VIRTUAL SCREENING METHODS; GASTRIC-ACID-SECRETION; NEIGHBORHOOD BEHAVIOR; MOLECULAR DESCRIPTOR; DATA SETS; IN-VITRO; INHIBITORS; ANALOGS;
D O I
10.1186/s13321-016-0148-0
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Background: The concept of molecular similarity is one of the central ideas in cheminformatics, despite the fact that it is ill-defined and rather difficult to assess objectively. Here we propose a practical definition of molecular similarity in the context of drug discovery: molecules A and B are similar if a medicinal chemist would be likely to synthesise and test them around the same time as part of the same medicinal chemistry program. The attraction of such a definition is that it matches one of the key uses of similarity measures in early-stage drug discovery. If we make the assumption that molecules in the same compound activity table in a medicinal chemistry paper were considered similar by the authors of the paper, we can create a dataset of similar molecules from the medicinal chemistry literature. Furthermore, molecules with decreasing levels of similarity to a reference can be found by either ordering molecules in an activity table by their activity, or by considering activity tables in different papers which have at least one molecule in common. Results: Using this procedure with activity data from ChEMBL, we have created two benchmark datasets for structural similarity that can be used to guide the development of improved measures. Compared to similar results from a virtual screen, these benchmarks are an order of magnitude more sensitive to differences between fingerprints both because of their size and because they avoid loss of statistical power due to the use of mean scores or ranks. We measure the performance of 28 different fingerprints on the benchmark sets and compare the results to those from the Riniker and Landrum (J Cheminf 5: 26, 2013. doi: 10.1186/1758-2946-5-26) ligand-based virtual screening benchmark. Conclusions: Extended-connectivity fingerprints of diameter 4 and 6 are among the best performing fingerprints when ranking diverse structures by similarity, as is the topological torsion fingerprint. However, when ranking very close analogues, the atom pair fingerprint outperforms the others tested. When ranking diverse structures or carrying out a virtual screen, we find that the performance of the ECFP fingerprints significantly improves if the bit-vector length is increased from 1024 to 16,384.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] An Improved Image Denoising Algorithm Based on Structural Similarity and Curvelet
    HE Ruo-nan
    YANG Wei-wei
    LI Mei
    科技信息, 2013, (01) : 60 - 60
  • [42] A New Algorithm for Money Laundering Detection Based on Structural Similarity
    Soltani, Reza
    Uyen Trang Nguyen
    Yang, Yang
    Faghani, Mohammad
    Yagoub, Alaa
    An, Aijun
    2016 IEEE 7TH ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS MOBILE COMMUNICATION CONFERENCE (UEMCON), 2016,
  • [43] Video enhancement through image registration based on structural similarity
    Amintoosi, M.
    Fathy, M.
    Mozayani, N.
    IMAGING SCIENCE JOURNAL, 2011, 59 (04) : 238 - 251
  • [44] Estimation of Structural Similarity of XML Document Based on Frequency and Path
    Ren Xueli
    Dai Yubiao
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON EDUCATION, MANAGEMENT, COMPUTER AND SOCIETY, 2016, 37 : 272 - 275
  • [45] Visual Analytics of Multiple Network Ranking Based on Structural Similarity
    Cheng, Aosheng
    Yin, Yulong
    Yan, Zhenyu
    Liu, Yuhua
    Zhou, Zhiguang
    2022 IEEE 15TH PACIFIC VISUALIZATION SYMPOSIUM (PACIFICVIS 2022), 2022, : 196 - 200
  • [46] EFFECT OF STANDARDIZATION ON FRAGMENT-BASED MEASURES OF STRUCTURAL SIMILARITY
    BATH, PA
    MORRIS, CA
    WILLETT, P
    JOURNAL OF CHEMOMETRICS, 1993, 7 (06) : 543 - 550
  • [47] Predicting the similarity search performance of fingerprints and their combination with molecular property descriptors using probabilistic and information theoretic modeling
    Vogt, Martin
    Nisius, Britta
    Bajorath, Jürgen
    Statistical Analysis and Data Mining, 2009, 2 (02): : 123 - 134
  • [48] MULTI-OPERATOR RETARGETING BASED ON PERCEPTUAL STRUCTURAL SIMILARITY
    Fang, Yuming
    Lin, Weisi
    Wang, Zhou
    Fang, Zhijun
    Xu, Long
    Yang, Yong
    2014 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (CHINASIP), 2014, : 416 - 420
  • [49] Nonparametric Bayesian dictionary learning algorithm based on structural similarity
    Dong D.
    Rui G.
    Tian W.
    Kang J.
    Liu G.
    Tongxin Xuebao/Journal on Communications, 2019, 40 (01): : 43 - 50
  • [50] TransExION: a transformer based explainable similarity metric for comparing IONS in tandem mass spectrometry
    Bui-Thi, Danh
    Liu, Youzhong
    Lippens, Jennifer L.
    Laukens, Kris
    De Vijlder, Thomas
    JOURNAL OF CHEMINFORMATICS, 2024, 16 (01):