Clustering of Synthetic Routes Using Tree Edit Distance

被引:11
作者
Genheden, Samuel [1 ]
Engkvist, Ola [1 ]
Bjerrum, Esben [1 ]
机构
[1] AstraZeneca Gothenburg, R&D, Discovery Sci, Mol AI, SE-43183 Molndal, Sweden
关键词
COMPUTER;
D O I
10.1021/acs.jcim.1c00232
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
We present a novel algorithm to compute the distance between synthetic routes based on tree edit distances. Such distances can be used to cluster synthesis routes generated using a retrosynthesis prediction tool. We show that the clustering of selected routes from a retrosynthesis analysis is performed in less than 10 s on average and only constitutes seven percent of the total time (prediction + clustering). Furthermore, we are able to show that representative routes from each cluster can be used to reduce the set of predicted routes. Finally, we show with a number of examples that the algorithm gives intuitive clusters that can be easily rationalized and that the routes in a cluster tend to use similar chemistry. The algorithm is included in the latest version of open-source AiZynthFinder software (https://github.com/MolecularAI/aiZynthFinder) and as a separate package (https://github.com/MolecularAI/route-distances).
引用
收藏
页码:3899 / 3907
页数:9
相关论文
共 30 条
  • [1] Selection of cost-effective yet chemically diverse pathways from the networks of computer-generated retrosynthetic plans
    Badowski, Tomasz
    Molga, Karol
    Grzybowski, Bartosz A.
    [J]. CHEMICAL SCIENCE, 2019, 10 (17) : 4640 - 4651
  • [2] A survey on tree edit distance and related problems
    Bille, P
    [J]. THEORETICAL COMPUTER SCIENCE, 2005, 337 (1-3) : 217 - 239
  • [3] Clustering as an Example of Optimizing Arbitrarily Chosen Objective Functions
    Budka, Marcin
    [J]. ADVANCED METHODS FOR COMPUTATIONAL COLLECTIVE INTELLIGENCE, 2013, 457 : 177 - 186
  • [4] A robotic platform for flow synthesis of organic compounds informed by AI planning
    Coley, Connor W.
    Thomas, Dale A., III
    Lummiss, Justin A. M.
    Jaworski, Jonathan N.
    Breen, Christopher P.
    Schultz, Victor
    Hart, Travis
    Fishman, Joshua S.
    Rogers, Luke
    Gao, Hanyu
    Hicklin, Robert W.
    Plehiers, Pieter P.
    Byington, Joshua
    Piotti, John S.
    Green, William H.
    Hart, A. John
    Jamison, Timothy F.
    Jensen, Klavs F.
    [J]. SCIENCE, 2019, 365 (6453) : 557 - +
  • [5] Prediction of Organic Reaction Outcomes Using Machine Learning
    Coley, Connor W.
    Barzilay, Regina
    Jaakkola, Tommi S.
    Green, William H.
    Jensen, Klays F.
    [J]. ACS CENTRAL SCIENCE, 2017, 3 (05) : 434 - 443
  • [6] Estivill-Castro Vladimir, 2002, ACM SIGKDD explorations newsletter, V4, P65, DOI 10.1145/568574.568575
  • [7] The ChEMBL database in 2017
    Gaulton, Anna
    Hersey, Anne
    Nowotka, Michal
    Bento, A. Patricia
    Chambers, Jon
    Mendez, David
    Mutowo, Prudence
    Atkinson, Francis
    Bellis, Louisa J.
    Cibrian-Uhalte, Elena
    Davies, Mark
    Dedman, Nathan
    Karlsson, Anneli
    Magarinos, Maria Paula
    Overington, John P.
    Papadatos, George
    Smit, Ines
    Leach, Andrew R.
    [J]. NUCLEIC ACIDS RESEARCH, 2017, 45 (D1) : D945 - D954
  • [8] Genheden S., 2020, QUICK POLICY FILTER, DOI [10.26434/chemrxiv.13280495.v1, DOI 10.26434/CHEMRXIV.13280495.V1]
  • [9] AiZynthFinder: a fast, robust and flexible open-source software for retrosynthetic planning
    Genheden, Samuel
    Thakkar, Amol
    Chadimova, Veronika
    Reymond, Jean-Louis
    Engkvist, Ola
    Bjerrum, Esben
    [J]. JOURNAL OF CHEMINFORMATICS, 2020, 12 (01)
  • [10] Johansson S., 2020, DRUG DISCOVERY TODAY