Clustering of Synthetic Routes Using Tree Edit Distance

被引:11
作者
Genheden, Samuel [1 ]
Engkvist, Ola [1 ]
Bjerrum, Esben [1 ]
机构
[1] AstraZeneca Gothenburg, R&D, Discovery Sci, Mol AI, SE-43183 Molndal, Sweden
关键词
COMPUTER;
D O I
10.1021/acs.jcim.1c00232
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
We present a novel algorithm to compute the distance between synthetic routes based on tree edit distances. Such distances can be used to cluster synthesis routes generated using a retrosynthesis prediction tool. We show that the clustering of selected routes from a retrosynthesis analysis is performed in less than 10 s on average and only constitutes seven percent of the total time (prediction + clustering). Furthermore, we are able to show that representative routes from each cluster can be used to reduce the set of predicted routes. Finally, we show with a number of examples that the algorithm gives intuitive clusters that can be easily rationalized and that the routes in a cluster tend to use similar chemistry. The algorithm is included in the latest version of open-source AiZynthFinder software (https://github.com/MolecularAI/aiZynthFinder) and as a separate package (https://github.com/MolecularAI/route-distances).
引用
收藏
页码:3899 / 3907
页数:9
相关论文
共 30 条
  • [21] Extended-Connectivity Fingerprints
    Rogers, David
    Hahn, Mathew
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2010, 50 (05) : 742 - 754
  • [22] The Medicinal Chemist's Toolbox: An Analysis of Reactions Used in the Pursuit of Drug Candidates
    Roughley, Stephen D.
    Jordan, Allan M.
    [J]. JOURNAL OF MEDICINAL CHEMISTRY, 2011, 54 (10) : 3451 - 3479
  • [23] SILHOUETTES - A GRAPHICAL AID TO THE INTERPRETATION AND VALIDATION OF CLUSTER-ANALYSIS
    ROUSSEEUW, PJ
    [J]. JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 1987, 20 : 53 - 65
  • [24] Schwaller P., 2019, ARXIV191008036
  • [25] Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction
    Schwaller, Philippe
    Laino, Teodoro
    Gaudin, Theophile
    Bolgar, Peter
    Hunter, Christopher A.
    Bekas, Costas
    Lee, Alpha A.
    [J]. ACS CENTRAL SCIENCE, 2019, 5 (09) : 1572 - 1583
  • [26] Planning chemical syntheses with deep neural networks and symbolic AI
    Segler, Marwin H. S.
    Preuss, Mike
    Waller, Mark P.
    [J]. NATURE, 2018, 555 (7698) : 604 - +
  • [27] CompRet: a comprehensive recommendation framework for chemical synthesis planning with algorithmic enumeration
    Shibukawa, Ryosuke
    Ishida, Shoichi
    Yoshizoe, Kazuki
    Wasa, Kunihiro
    Takasu, Kiyosei
    Okuno, Yasushi
    Terayama, Kei
    Tsuda, Koji
    [J]. JOURNAL OF CHEMINFORMATICS, 2020, 12 (01)
  • [28] Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain
    Thakkar, Amol
    Kogej, Thierry
    Reymond, Jean-Louis
    Engkvist, Ola
    Bjerrum, Esben Jannik
    [J]. CHEMICAL SCIENCE, 2020, 11 (01) : 154 - 168
  • [29] Chemical similarity searching
    Willett, P
    Barnard, JM
    Downs, GM
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1998, 38 (06): : 983 - 996
  • [30] A Dynamic Programming A* Algorithm for Computing Unordered Tree Edit Distance
    Yoshino, Takuya
    Higuchi, Shoichi
    Hirata, Kouichi
    [J]. 2013 SECOND IIAI INTERNATIONAL CONFERENCE ON ADVANCED APPLIED INFORMATICS (IIAI-AAI 2013), 2013, : 135 - 140