Chemical space-informed machine learning models for rapid predictions of x-ray photoelectron spectra of organic molecules

被引:0
|
作者
Tripathy, Susmita [1 ]
Das, Surajit [1 ]
Jindal, Shweta [1 ]
Ramakrishnan, Raghunathan [1 ]
机构
[1] Tata Inst Fundamental Res, Hyderabad 500046, India
来源
MACHINE LEARNING-SCIENCE AND TECHNOLOGY | 2024年 / 5卷 / 04期
关键词
x-ray photoelectron spectra; core-electron binding energy; density functional theory; machine learning; chemical space; LEVEL BINDING-ENERGIES; QUANTUM-CHEMISTRY; XPS SPECTRA; APPROXIMATION; SPECTROSCOPY; STATES; ATOMS;
D O I
10.1088/2632-2153/ad871d
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present machine learning models based on kernel-ridge regression for predicting x-ray photoelectron spectra of organic molecules originating from the K-shell ionization energies of carbon (C), nitrogen (N), oxygen (O), and fluorine (F) atoms. We constructed the training dataset through high-throughput calculations of K-shell core-electron binding energies (CEBEs) for 12 880 small organic molecules in the bigQM7 omega dataset, employing the Delta-SCF formalism coupled with meta-GGA-DFT and a variationally converged basis set. The models are cost-effective, as they require the atomic coordinates of a molecule generated using universal force fields while estimating the target-level CEBEs corresponding to DFT-level equilibrium geometry. We explore transfer learning by utilizing the atomic environment feature vectors learned using a graph neural network framework in kernel-ridge regression. Additionally, we enhance accuracy within the Delta-machine learning framework by leveraging inexpensive baseline spectra derived from Kohn-Sham eigenvalues. When applied to 208 combinatorially substituted uracil molecules larger than those in the training set, our analyses suggest that the models may not provide quantitatively accurate predictions of CEBEs but offer a strong linear correlation relevant for virtual high-throughput screening. We present the dataset and models as the Python module, cebeconf, to facilitate further explorations.
引用
收藏
页数:17
相关论文
共 41 条
  • [31] Revving up 13C NMR shielding predictions across chemical space: benchmarks for atoms-in-molecules kernel machine learning with new data for 134 kilo molecules
    Gupta, Amit
    Chakraborty, Sabyasachi
    Ramakrishnan, Raghunathan
    MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2021, 2 (03):
  • [32] Experiment-Driven Atomistic Materials Modeling: A Case Study Combining X-Ray Photoelectron Spectroscopy and Machine Learning Potentials to Infer the Structure of Oxygen-Rich Amorphous Carbon
    Zarrouk, Tigany
    Ibragimova, Rina
    Bartok, Albert P.
    Caro, Miguel A.
    JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2024, 146 (21) : 14645 - 14659
  • [33] Accurate, affordable, and generalizable machine learning simulations of transition metal x-ray absorption spectra using the XANESNET deep neural network
    Rankine, C. D.
    Penfold, T. J.
    JOURNAL OF CHEMICAL PHYSICS, 2022, 156 (16)
  • [34] Retrieving the Quantitative Chemical Information at Nanoscale from Scanning Electron Microscope Energy Dispersive X-ray Measurements by Machine Learning
    Jany, B. R.
    Janas, A.
    Krok, F.
    NANO LETTERS, 2017, 17 (11) : 6520 - 6525
  • [35] Absorption of Hydrocarbons on Palladium Catalysts: From Simple Models Towards Machine Learning Analysis of X-ray Absorption Spectroscopy Data
    Usoltsev, Oleg A.
    Bugaev, Aram L.
    Guda, Alexander A.
    Guda, Sergey A.
    Soldatov, Alexander V.
    TOPICS IN CATALYSIS, 2020, 63 (1-2) : 58 - 65
  • [36] Absorption of Hydrocarbons on Palladium Catalysts: From Simple Models Towards Machine Learning Analysis of X-ray Absorption Spectroscopy Data
    Oleg A. Usoltsev
    Aram L. Bugaev
    Alexander A. Guda
    Sergey A. Guda
    Alexander V. Soldatov
    Topics in Catalysis, 2020, 63 : 58 - 65
  • [37] Influence of device configuration and noise on a machine learning predictor for the selection of nanoparticle small-angle X-ray scattering models
    Monge, Nicolas
    Amini, Massih Reza
    Deschamps, Alexis
    ACTA CRYSTALLOGRAPHICA A-FOUNDATION AND ADVANCES, 2024, 80 : 405 - 413
  • [38] Identification of chemical species on plasma-treated polytetrafluoroethylene surface by ab-initio calculations of core-energy-level shift in X-ray photoelectron spectra
    Nishino, Misa
    Inagaki, Kouji
    Morikawa, Yoshitada
    Yamamura, Kazuya
    Ohkubo, Yuji
    APPLIED SURFACE SCIENCE, 2024, 655
  • [39] Random forest machine learning models for interpretable X-ray absorption near-edge structure spectrum-property relationships
    Torrisi, Steven B.
    Carbone, Matthew R.
    Rohr, Brian A.
    Montoya, Joseph H.
    Ha, Yang
    Yano, Junko
    Suram, Santosh K.
    Hung, Linda
    NPJ COMPUTATIONAL MATERIALS, 2020, 6 (01)
  • [40] Optimization of Material Composition of Li-Intercalated Metal-Organic Framework Electrodes Using a Combination of Experiments and Machine Learning of X-Ray Diffraction Patterns
    Hazama, Hirofumi
    Murai, Daisuke
    Nagasako, Naoyuki
    Hasegawa, Masaki
    Ogihara, Nobuhiro
    ADVANCED MATERIALS TECHNOLOGIES, 2020, 5 (09):