One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome

被引:200
作者
Capecchi, Alice [1 ]
Probst, Daniel [1 ]
Reymond, Jean-Louis [1 ]
机构
[1] Univ Bern, Dept Chem & Biochem, Freiestr 3, CH-3012 Bern, Switzerland
基金
瑞士国家科学基金会;
关键词
Molecular fingerprints; Virtual screening; Chemical space; Databases; Locality sensitive hashing; DATABASE; SEARCH; SHAPE; ZINC; SETS;
D O I
10.1186/s13321-020-00445-4
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Background Molecular fingerprints are essential cheminformatics tools for virtual screening and mapping chemical space. Among the different types of fingerprints, substructure fingerprints perform best for small molecules such as drugs, while atom-pair fingerprints are preferable for large molecules such as peptides. However, no available fingerprint achieves good performance on both classes of molecules. Results Here we set out to design a new fingerprint suitable for both small and large molecules by combining substructure and atom-pair concepts. Our quest resulted in a new fingerprint called MinHashed atom-pair fingerprint up to a diameter of four bonds (MAP4). In this fingerprint the circular substructures with radii ofr = 1 andr = 2 bonds around each atom in an atom-pair are written as two pairs of SMILES, each pair being combined with the topological distance separating the two central atoms. These so-called atom-pair molecular shingles are hashed, and the resulting set of hashes is MinHashed to form the MAP4 fingerprint. MAP4 significantly outperforms all other fingerprints on an extended benchmark that combines the Riniker and Landrum small molecule benchmark with a peptide benchmark recovering BLAST analogs from either scrambled or point mutation analogs. MAP4 furthermore produces well-organized chemical space tree-maps (TMAPs) for databases as diverse as DrugBank, ChEMBL, SwissProt and the Human Metabolome Database (HMBD), and differentiates between all metabolites in HMBD, over 70% of which are indistinguishable from their nearest neighbor using substructure fingerprints. Conclusion MAP4 is a new molecular fingerprint suitable for drugs, biomolecules, and the metabolome and can be adopted as a universal fingerprint to describe and search chemical space. The source code is available atand interactive MAP4 similarity search tools and TMAPs for various databases are accessible atand.
引用
收藏
页数:15
相关论文
共 49 条
  • [31] Identification of potent and selective small molecule inhibitors of the cation channel TRPM4
    Ozhathil, Lijo Cherian
    Delalande, Clemence
    Bianchi, Beatrice
    Nemeth, Gabor
    Kappel, Sven
    Thomet, Urs
    Ross-Kaschitza, Daniela
    Simonin, Celine
    Rubin, Matthias
    Gertsch, Jurg
    Lochner, Martin
    Peinelt, Christine
    Reymond, Jean-Louis
    Abriel, Hugues
    [J]. BRITISH JOURNAL OF PHARMACOLOGY, 2018, 175 (12) : 2504 - 2519
  • [32] Overview of Friedman's Test and Post-hoc Analysis
    Pereira, Dulce G.
    Afonso, Anabela
    Medeiros, Fatima Melo
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2015, 44 (10) : 2636 - 2653
  • [33] Poux S, 2016, EXPERT CURATION SUST, DOI [10.1101/094011, DOI 10.1101/094011]
  • [34] Visualization of very large high-dimensional data sets as minimum spanning trees
    Probst, Daniel
    Reymond, Jean-Louis
    [J]. JOURNAL OF CHEMINFORMATICS, 2020, 12 (01)
  • [35] A probabilistic molecular fingerprint for big data settings
    Probst, Daniel
    Reymond, Jean-Louis
    [J]. JOURNAL OF CHEMINFORMATICS, 2018, 10
  • [36] FUn: a framework for interactive visualizations of large, high-dimensional datasets on the web
    Probst, Daniel
    Reymond, Jean-Louis
    [J]. BIOINFORMATICS, 2018, 34 (08) : 1433 - 1435
  • [37] Open-source platform to benchmark fingerprints for ligand-based virtual screening
    Riniker, Sereina
    Landrum, Gregory A.
    [J]. JOURNAL OF CHEMINFORMATICS, 2013, 5
  • [38] Extended-Connectivity Fingerprints
    Rogers, David
    Hahn, Mathew
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2010, 50 (05) : 742 - 754
  • [39] Maximum Unbiased Validation (MUV) Data Sets for Virtual Screening Based on PubChem Bioactivity Data
    Rohrer, Sebastian G.
    Baumann, Knut
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2009, 49 (02) : 169 - 184
  • [40] Schneider G, 1999, ANGEW CHEM INT EDIT, V38, P2894, DOI 10.1002/(SICI)1521-3773(19991004)38:19<2894::AID-ANIE2894>3.3.CO