Machine learning methods to predict the crystallization propensity of small organic molecules

被引:11
|
作者
Pereira, Florbela [1 ,2 ]
机构
[1] Univ Nova Lisboa, Fac Ciencias & Tecnol, Dept Quim, LAQV, Caparica, Portugal
[2] Univ Nova Lisboa, Fac Ciencias & Tecnol, Dept Quim, REQUIMTE, Caparica, Portugal
关键词
CLASSIFICATION; STABILITY; TENDENCY;
D O I
10.1039/d0ce00070a
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Machine learning (ML) algorithms were explored for the prediction of the crystallization propensity based on molecular descriptors and fingerprints generated from 2D chemical structures and 3D molecular descriptors from 3D chemical structures optimized with empirical methods. In total, 57815 molecules were retrieved from the Reaxys (R) database, from those 53 998 molecules are recorded as crystalline (class A), 3097 as polymorphic (class B), and 720 as amorphous (class C). A training data set with 40 462 organic molecules was used to build the models, which were validated with an external test set comprising 17353 organic molecules. Several ML algorithms such as random forest (RF), support vector machines (SVM), and deep learning multilayer perceptron networks (MLP) were screened. The best performance was achieved with a consensus classification model obtained by RF, SVM, and MLP models, which predicted the external test set with an overall predictive accuracy (Q) of up to 80%.
引用
收藏
页码:2817 / 2826
页数:10
相关论文
共 50 条
  • [11] Advanced crystallisation methods for small organic molecules
    Metherall, J. P.
    Carroll, R. C.
    Coles, S. J.
    Hall, M. J.
    Probert, M. R.
    CHEMICAL SOCIETY REVIEWS, 2023, 52 (06) : 1995 - 2010
  • [12] Modeling the Crystallization of Proteins and Small Organic Molecules in Nanoliter Drops
    Dombrowski, Richard D.
    Litster, James D.
    Wagner, Norman J.
    He, Yinghe
    AICHE JOURNAL, 2010, 56 (01) : 79 - 91
  • [13] Encapsulated Nanodroplet Crystallization of Organic-Soluble Small Molecules
    Tyler, Andrew R.
    Ragbirsingh, Ronnie
    McMonagle, Charles J.
    Waddell, Paul G.
    Heaps, Sarah E.
    Steed, Jonathan W.
    Thaw, Paul
    Hall, Michael J.
    Probert, Michael R.
    CHEM, 2020, 6 (07): : 1755 - 1765
  • [14] Assessment of machine learning approaches for predicting the crystallization propensity of active pharmaceutical ingredients
    Ghosh, Ayana
    Louis, Lydie
    Arora, Kapildev K.
    Hancock, Bruno C.
    Krzyzaniak, Joseph F.
    Meenan, Paul
    Nakhmanson, Serge
    Wood, Geoffrey P. F.
    CRYSTENGCOMM, 2019, 21 (08): : 1215 - 1223
  • [15] Application of Machine Learning Methods to Predict the Air Half-Lives of Persistent Organic Pollutants
    Zhang, Ying
    Xie, Liangxu
    Zhang, Dawei
    Xu, Xiaojun
    Xu, Lei
    Kireev, Dmitri B.
    MOLECULES, 2023, 28 (22):
  • [16] Solvate Prediction for Pharmaceutical Organic Molecules with Machine Learning
    Xin, Dongyue
    Gonneila, Nina C.
    He, Xiaorong
    Horspool, Keith
    CRYSTAL GROWTH & DESIGN, 2019, 19 (03) : 1903 - 1911
  • [17] A machine learning approach for predicting the nucleophilicity of organic molecules
    Saini, Vaneet
    Sharma, Aditya
    Nivatia, Dhruv
    PHYSICAL CHEMISTRY CHEMICAL PHYSICS, 2022, 24 (03) : 1821 - 1829
  • [18] Machine learning methods to predict solubilities of rock samples
    Hanzelik, Pal Peter
    Gergely, Szilveszter
    Gaspar, Csaba
    Gyory, Laszlo
    JOURNAL OF CHEMOMETRICS, 2020, 34 (02)
  • [19] Using Machine Learning Methods to Predict Autism Syndrome
    Alhakami, Hosam
    Alajlani, Fatimah
    Alghamdi, Alshymaa
    Baz, Abdullah
    Alsubait, Tahani
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2020, 20 (04): : 221 - 228
  • [20] The role of small organic amine molecules and their aggregates in the crystallization of microporous materials
    Chao, Ran
    Kong, Yalu
    Jin, Liang
    Ren, Yuan
    Ding, Yue
    Li, Niu
    Guan, Naijia
    Xiang, Shouhe
    MICROPOROUS AND MESOPOROUS MATERIALS, 2013, 176 : 132 - 138