One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome

被引:200
作者
Capecchi, Alice [1 ]
Probst, Daniel [1 ]
Reymond, Jean-Louis [1 ]
机构
[1] Univ Bern, Dept Chem & Biochem, Freiestr 3, CH-3012 Bern, Switzerland
基金
瑞士国家科学基金会;
关键词
Molecular fingerprints; Virtual screening; Chemical space; Databases; Locality sensitive hashing; DATABASE; SEARCH; SHAPE; ZINC; SETS;
D O I
10.1186/s13321-020-00445-4
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Background Molecular fingerprints are essential cheminformatics tools for virtual screening and mapping chemical space. Among the different types of fingerprints, substructure fingerprints perform best for small molecules such as drugs, while atom-pair fingerprints are preferable for large molecules such as peptides. However, no available fingerprint achieves good performance on both classes of molecules. Results Here we set out to design a new fingerprint suitable for both small and large molecules by combining substructure and atom-pair concepts. Our quest resulted in a new fingerprint called MinHashed atom-pair fingerprint up to a diameter of four bonds (MAP4). In this fingerprint the circular substructures with radii ofr = 1 andr = 2 bonds around each atom in an atom-pair are written as two pairs of SMILES, each pair being combined with the topological distance separating the two central atoms. These so-called atom-pair molecular shingles are hashed, and the resulting set of hashes is MinHashed to form the MAP4 fingerprint. MAP4 significantly outperforms all other fingerprints on an extended benchmark that combines the Riniker and Landrum small molecule benchmark with a peptide benchmark recovering BLAST analogs from either scrambled or point mutation analogs. MAP4 furthermore produces well-organized chemical space tree-maps (TMAPs) for databases as diverse as DrugBank, ChEMBL, SwissProt and the Human Metabolome Database (HMBD), and differentiates between all metabolites in HMBD, over 70% of which are indistinguishable from their nearest neighbor using substructure fingerprints. Conclusion MAP4 is a new molecular fingerprint suitable for drugs, biomolecules, and the metabolome and can be adopted as a universal fingerprint to describe and search chemical space. The source code is available atand interactive MAP4 similarity search tools and TMAPs for various databases are accessible atand.
引用
收藏
页数:15
相关论文
共 49 条
  • [1] ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
  • [2] Andoni A, 2017, PROCEEDINGS OF THE TWENTY-EIGHTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P67
  • [3] Polypharmacology Browser PPB2: Target Prediction Combining Nearest Neighbors with Machine Learning
    Awale, Mahendra
    Reymond, Jean-Louis
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2019, 59 (01) : 10 - 17
  • [4] Awale M, 2019, METHODS MOL BIOL, V1888, P255, DOI 10.1007/978-1-4939-8891-4_15
  • [5] Chemical Space: Big Data Challenge for Molecular Diversity
    Awale, Mahendra
    Visini, Ricardo
    Probst, Daniel
    Arus-Pous, Josep
    Reymond, Jean-Louis
    [J]. CHIMIA, 2017, 71 (10) : 661 - 666
  • [6] Stereoselective virtual screening of the ZINC database using atom pair 3D-fingerprints
    Awale, Mahendra
    Jin, Xian
    Reymond, Jean-Louis
    [J]. JOURNAL OF CHEMINFORMATICS, 2015, 7
  • [7] Atom Pair 2D-Fingerprints Perceive 3D-Molecular Shape and Pharmacophores for Very Fast Virtual Screening of ZINC and GDB-17
    Awale, Mahendra
    Reymond, Jean-Louis
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2014, 54 (07) : 1892 - 1907
  • [8] Bajusz D, 2017, COMPREHENSIVE MEDICINAL CHEMISTRY III, VOL 3: IN SILICO DRUG DISCOVERY TOOLS, P329, DOI 10.1016/B978-0-12-409547-2.12345-5
  • [9] UniProt: a worldwide hub of protein knowledge
    Bateman, Alex
    Martin, Maria-Jesus
    Orchard, Sandra
    Magrane, Michele
    Alpi, Emanuele
    Bely, Benoit
    Bingley, Mark
    Britto, Ramona
    Bursteinas, Borisas
    Busiello, Gianluca
    Bye-A-Jee, Hema
    Da Silva, Alan
    De Giorgi, Maurizio
    Dogan, Tunca
    Castro, Leyla Garcia
    Garmiri, Penelope
    Georghiou, George
    Gonzales, Daniel
    Gonzales, Leonardo
    Hatton-Ellis, Emma
    Ignatchenko, Alexandr
    Ishtiaq, Rizwan
    Jokinen, Petteri
    Joshi, Vishal
    Jyothi, Dushyanth
    Lopez, Rodrigo
    Luo, Jie
    Lussi, Yvonne
    MacDougall, Alistair
    Madeira, Fabio
    Mahmoudy, Mahdi
    Menchi, Manuela
    Nightingale, Andrew
    Onwubiko, Joseph
    Palka, Barbara
    Pichler, Klemens
    Pundir, Sangya
    Qi, Guoying
    Raj, Shriya
    Renaux, Alexandre
    Lopez, Milagros Rodriguez
    Saidi, Rabie
    Sawford, Tony
    Shypitsyna, Aleksandra
    Speretta, Elena
    Turner, Edward
    Tyagi, Nidhi
    Vasudev, Preethi
    Volynkin, Vladimir
    Wardell, Tony
    [J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) : D506 - D515
  • [10] Bawa M., 2005, WORLD WID WEB C, P651, DOI [DOI 10.1145/1060745.1060840, 10.1145/1060745.1060840]