Exploring chemical space for lead identification by propagating on chemical similarity network

被引:3
作者
Yi, Jungseob [1 ]
Lee, Sangseon [2 ]
Lim, Sangsoo [3 ]
Cho, Changyun [4 ]
Piao, Yinhua [5 ]
Yeo, Marie [6 ]
Kim, Dongkyu [6 ]
Kim, Sun [1 ,4 ,5 ,7 ]
Lee, Sunho [7 ]
机构
[1] Seoul Natl Univ, Interdisciplinary Program Artificial Intelligence, Gwanak Ro 1,Gwanak Gu, Seoul 08826, South Korea
[2] Seoul Natl Univ, Inst Comp Technol, Gwanak Ro 1,Gwanak Gu, Seoul 08826, South Korea
[3] Dongguk Univ, Sch AI Software Convergence, Pildong Ro 1 Gil,Jung Gu, Seoul, South Korea
[4] Seoul Natl Univ, Interdisciplinary Program Bioinformat, Gwanak Ro 1,Gwanak Gu, Seoul 08826, South Korea
[5] Seoul Natl Univ, Dept Comp Sci & Engn, Gwanak Ro 1,Gwanak Gu, Seoul 08826, South Korea
[6] PHARMGENSCI CO LTD, 216,Dongjak Daero,Seocho Gu, Seoul 06554, South Korea
[7] AIGENDRUG CO LTD, Gwanak Ro 1,Gwanak Gu, Seoul 08826, South Korea
基金
新加坡国家研究基金会;
关键词
Lead identification; Data mining; Chemical network construction; Network propagation; DRUG DISCOVERY; LEARNING APPROACH; DATABASE; PREDICTION; MOLECULES; MODEL;
D O I
10.1016/j.csbj.2023.08.016
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Lead identification is a fundamental step to prioritize candidate compounds for downstream drug discovery process. Machine learning (ML) and deep learning (DL) approaches are widely used to identify lead compounds using both chemical property and experimental information. However, ML or DL methods rarely consider compound similarity information directly since ML and DL models use abstract representation of molecules for model construction. Alternatively, data mining approaches are also used to explore chemical space with drug candidates by screening undesirable compounds. A major challenge for data mining approaches is to develop efficient data mining methods that search large chemical space for desirable lead compounds with low false positive rate. Results: In this work, we developed a network propagation (NP) based data mining method for lead identification that performs search on an ensemble of chemical similarity networks. We compiled 14 fingerprint-based similarity networks. Given a target protein of interest, we use a deep learning-based drug target interaction model to narrow down compound candidates and then we use network propagation to prioritize drug candidates that are highly correlated with drug activity score such as IC50. In an extensive experiment with BindingDB, we showed that our approach successfully discovered intentionally unlabeled compounds for given targets. To further demonstrate the prediction power of our approach, we identified 24 candidate leads for CLK1. Two out of five synthesizable candidates were experimentally validated in binding assays. In conclusion, our framework can be very useful for lead identification from very large compound databases such as ZINC.
引用
收藏
页码:4187 / 4195
页数:9
相关论文
共 69 条
  • [1] Machine learning classification can reduce false positives in structure-based virtual screening
    Adeshina, Yusuf O.
    Deeds, Eric J.
    Karanicolas, John
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2020, 117 (31) : 18477 - 18488
  • [2] Aittokallio T, 2022, What are the current challenges for machine learning in drug discovery and repurposing?
  • [3] [Anonymous], 2004, P 10 ACM SIGKDD INT
  • [4] [Anonymous], 2009, SIGKDD Explor Newsl, DOI [10.1145/1656274.1656280, DOI 10.1145/1656274.1656280]
  • [5] Safety screening in early drug discovery: An optimized assay panel
    Bendels, Stefanie
    Bissantz, Caterina
    Fasching, Bernhard
    Gerebtzoff, Gregori
    Guba, Wolfgang
    Kansy, Manfred
    Migeon, Jacques
    Mohr, Susanne
    Peters, Jens-Uwe
    Tillier, Fabien
    Wyler, Rene
    Lerner, Christian
    Kramer, Christian
    Richter, Hans
    Roberts, Sonia
    [J]. JOURNAL OF PHARMACOLOGICAL AND TOXICOLOGICAL METHODS, 2019, 99
  • [6] Generative chemistry: drug discovery with deep learning generative models
    Bian, Yuemin
    Xie, Xiang-Qun
    [J]. JOURNAL OF MOLECULAR MODELING, 2021, 27 (03)
  • [7] 970 Million Druglike Small Molecules for Virtual Screening in the Chemical Universe Database GDB-13
    Blum, Lorenz C.
    Reymond, Jean-Louis
    [J]. JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2009, 131 (25) : 8732 - +
  • [8] A Review of the Application of Machine Learning and Data Mining Approaches in Continuum Materials Mechanics
    Bock, Frederic E.
    Aydin, Roland C.
    Cyron, Christian J.
    Huber, Norbert
    Kalidindi, Surya R.
    Klusemann, Benjamin
    [J]. FRONTIERS IN MATERIALS, 2019, 6
  • [9] Network propagation: a universal amplifier of genetic associations
    Cowen, Lenore
    Ideker, Trey
    Raphael, Benjamin J.
    Sharan, Roded
    [J]. NATURE REVIEWS GENETICS, 2017, 18 (09) : 551 - 562
  • [10] PAINS in the Assay: Chemical Mechanisms of Assay Interference and Promiscuous Enzymatic Inhibition Observed during a Sulfhydryl-Scavenging HTS
    Dahlin, Jayme L.
    Nissink, J. Willem M.
    Strasser, Jessica M.
    Francis, Subhashree
    Higgins, LeeAnn
    Zhou, Hui
    Zhang, Zhiguo
    Walters, Michael A.
    [J]. JOURNAL OF MEDICINAL CHEMISTRY, 2015, 58 (05) : 2091 - 2113