Prediction of Natural Product Classes Using Machine Learning and 13C NMR Spectroscopic Data

被引:38
作者
Martinez-Trevino, Saul H. [1 ]
Uc-Cetina, Victor [2 ]
Fernandez-Herrera, Maria A. [1 ]
Merino, Gabriel [1 ]
机构
[1] Ctr Invest & Estudios Avanzados, Dept Fis Aplicada, Merida 97310, Yucatan, Mexico
[2] Univ Autonoma Yucatan, Fac Matemat, Merida 97119, Yucatan, Mexico
关键词
AUTOMATED STRUCTURE ELUCIDATION; TRITERPENOID SAPONINS; GLYCOSIDES; COUMARIN; DEREPLICATION; SPECTRA; LIGNAN; PLANTS; LOGIC; L;
D O I
10.1021/acs.jcim.0c00293
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Structure elucidation of chemical compounds is a complex and challenging activity that requires expertise and well-suited tools. To assign the molecular structure of a given compound,C- 13 NMR is one of the most widely used techniques because of its broad range of structural information. Taking into account that molecules found in nature can be grouped into natural product (NP) classes because of structural similarities, we explore the possibility of NP class prediction via C-13 NMR data. Employing freely available C-13 NMR data of NPs, we trained four classifiers for the prediction of eight common NP classes. The best performance was obtained with the XGBoost classifier reaching f1-scores of above 0.82. We also performed experiments with different percentages of positive samples, including the glycoside presence. Furthermore, we tested cases outside the data set, yielding performances above 80% for most classes. For the chromans case, we restricted the test examples to the coumarin subclass, and the prediction accuracy increased to 100%.
引用
收藏
页码:3376 / 3386
页数:11
相关论文
共 85 条
  • [1] A new highly oxygenated abietane diterpenoid and a new lysosome generating phorbol ester from the roots of Euphorbia fischeriana Steud
    Adelakun, Tiwalade A.
    Ding, Xiao
    Ombati, Rose M.
    Zhao, Ning-Dong
    Obodozie-Ofoegbu, Obiageri
    Di, Ying-Tong
    Zhang, Yu
    Hao, Xiao-Jiang
    [J]. NATURAL PRODUCT RESEARCH, 2020, 34 (21) : 3027 - 3035
  • [2] [Anonymous], 1996, NEURAL NETWORKS PATT
  • [3] [Anonymous], 2014, ACD STRUCTURE ELUCID
  • [4] [Anonymous], 2009, The Elements of Statistical Learning: Data Mining, Inference, and Prediction
  • [5] Structural Elucidation of a Coumarin with New Skeleton from Artemisia ordosica
    Bao, Wugedunqiqige
    Wang, Qinghu
    Hao, Junsheng
    [J]. RECORDS OF NATURAL PRODUCTS, 2019, 13 (05) : 413 - 417
  • [6] Cholesterol and plants
    Behrman, EJ
    Gopalan, V
    [J]. JOURNAL OF CHEMICAL EDUCATION, 2005, 82 (12) : 1791 - 1793
  • [7] Structure Elucidation of Unknown Metabolites in Metabolomics by Combined NMR and MS/MS Prediction
    Boiteau, Rene M.
    Hoyt, David W.
    Nicora, Carrie D.
    Kinmonth-Schultz, Hannah A.
    Ward, Joy K.
    Bingol, Kerem
    [J]. METABOLITES, 2018, 8 (01)
  • [8] Using NMR to identify and characterize natural products
    Breton, Rosemary C.
    Reynolds, William F.
    [J]. NATURAL PRODUCT REPORTS, 2013, 30 (04) : 501 - 524
  • [9] GuacaMol: Benchmarking Models for de Novo Molecular Design
    Brown, Nathan
    Fiscato, Marco
    Segler, Marwin H. S.
    Vaucher, Alain C.
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2019, 59 (03) : 1096 - 1108
  • [10] Marine natural products
    Carroll, Anthony R.
    Copp, Brent R.
    Davis, Rohan A.
    Keyzers, Robert A.
    Prinsep, Michele R.
    [J]. NATURAL PRODUCT REPORTS, 2019, 36 (01) : 122 - 173