MIM-ML: A Novel Quantum Chemical Fragment-Based Random Forest Model for Accurate Prediction of NMR Chemical Shifts of Nucleic Acids

被引:2
作者
Chandy, Sruthy K. [1 ]
Raghavachari, Krishnan [1 ]
机构
[1] Indiana Univ, Dept Chem, Bloomington, IN 47405 USA
基金
美国国家科学基金会;
关键词
DENSITY-FUNCTIONAL THEORY; STRUCTURE VALIDATION; PROTEIN-STRUCTURE; NATURAL-PRODUCTS; DNA DUPLEXES; FORCE-FIELD; C-13; H-1; RNA; DYNAMICS;
D O I
10.1021/acs.jctc.3c00563
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
We developed a random forest machine learning (ML) model for the prediction of H-1 and C-13 NMR chemical shifts of nucleic acids. Our ML model is trained entirely on reproducing computed chemical shifts obtained previously on 10 nucleic acids using a Molecules-in-Molecules (MIM) fragment-based density functional theory (DFT) protocol including microsolvation effects. Our ML model includes structural descriptors as well as electronic descriptors from an inexpensive low-level semiempirical calculation (GFN2-xTB) and trained on a relatively small number of DFT chemical shifts (2080 H-1 chemical shifts and 1780 C-13 chemical shifts on the 10 nucleic acids). The ML model is then used to make chemical shift predictions on 8 new nucleic acids ranging in size from 600 to 900 atoms and compared directly to experimental data. Though no experimental data was used in the training, the performance of our model is excellent (mean absolute deviation of 0.34 ppm for H-1 chemical shifts and 2.52 ppm for C-13 chemical shifts for the test set), despite having some nonstandard structures. A simple analysis suggests that both structural and electronic descriptors are critical for achieving reliable predictions. This is the first attempt to combine ML from fragment-based DFT calculations to predict experimental chemical shifts accurately, making the MIM-ML model a valuable tool for NMR predictions of nucleic acids.
引用
收藏
页码:6632 / 6642
页数:11
相关论文
共 81 条
  • [1] Altona C, 2000, MAGN RESON CHEM, V38, P95, DOI 10.1002/(SICI)1097-458X(200002)38:2<95::AID-MRC592>3.3.CO
  • [2] 2-D
  • [3] [Anonymous], 2016, GAUSS 16 REV C 01
  • [4] Machine learning for predicting product distributions in catalytic regioselective reactions
    Banerjee, Sayan
    Sreenithya, A.
    Sunoj, Raghavan B.
    [J]. PHYSICAL CHEMISTRY CHEMICAL PHYSICS, 2018, 20 (27) : 18311 - 18318
  • [5] Extendedtight-bindingquantum chemistry methods
    Bannwarth, Christoph
    Caldeweyher, Eike
    Ehlert, Sebastian
    Hansen, Andreas
    Pracht, Philipp
    Seibert, Jakob
    Spicher, Sebastian
    Grimme, Stefan
    [J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE, 2021, 11 (02)
  • [6] GFN2-xTB-An Accurate and Broadly Parametrized Self-Consistent Tight-Binding Quantum Chemical Method with Multipole Electrostatics and Density-Dependent Dispersion Contributions
    Bannwarth, Christoph
    Ehlert, Sebastian
    Grimme, Stefan
    [J]. JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2019, 15 (03) : 1652 - 1671
  • [7] Barone G, 2002, CHEM-EUR J, V8, P3233, DOI 10.1002/1521-3765(20020715)8:14<3233::AID-CHEM3233>3.0.CO
  • [8] 2-0
  • [9] Structure of DNA sequence d-TGATCA by two-dimensional nuclear magnetic resonance spectroscopy and restrained molecular dynamics
    Barthwal, R
    Awasthi, P
    Monica
    Kaur, M
    Sharma, U
    Srivastava, N
    Barthwal, SK
    Govil, G
    [J]. JOURNAL OF STRUCTURAL BIOLOGY, 2004, 148 (01) : 34 - 50
  • [10] Machine-learning approach for one- and two-body corrections to density functional theory: Applications to molecular and condensed water
    Bartok, Albert P.
    Gillan, Michael J.
    Manby, Frederick R.
    Csanyi, Gabor
    [J]. PHYSICAL REVIEW B, 2013, 88 (05)