Exploring Machine Learning in Chemistry through the Classification of Spectra: An Undergraduate Project

被引:14
作者
St James, Alanah Grant [1 ]
Hand, Luke [1 ]
Mills, Thomas [1 ]
Song, Liwen [1 ]
Brunt, Annabel S. J. [1 ]
Mann, Patrick E. Bergstrom [2 ]
Worrall, Andrew F. [2 ]
Stewart, Malcolm I. [2 ]
Vallance, Claire [1 ]
机构
[1] Univ Oxford, Dept Chem, Chem Res Lab, Oxford OX1 3TA, England
[2] Univ Oxford, Dept Chem, Chem Teaching Lab, Oxford OX1 3PS, England
关键词
Upper-Division Undergraduate; Laboratory Instruction; Chemoinformatics; Interdisciplinary; Multidisciplinary; Computer-Based Learning; Chemometrics; Mass Spectrometry; Spectroscopy; Computational Chemistry;
D O I
10.1021/acs.jchemed.2c00682
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Applications of machine learning in chemistry are many and varied, from prediction of structure-property relation-ships, to modeling of potential energy surfaces for large scale atomistic simulations. We describe a generalized approach for the application of machine learning to the classification of spectra which can be used as the basis for a wide variety of undergraduate projects. While our examples use FTIR and mass spectra, the approach could equally well be used with UV-visible, Raman, NMR, or indeed any other type of spectra. We summarize a number of different unsupervised and supervised machine learning algorithms that can be used to classify spectra into groups, and illustrate their application using data from three different projects carried out by fourth year chemistry undergraduates. The three projects investigated the ability of the various machine learning approaches to correctly classify spectra of a variety of fruits, whiskies, and teas, respectively. In all cases the algorithms were able to differentiate between the various samples used in each study, and the trained machine learning models could then be used to classify unknown samples with a high degree of accuracy (>98% in many cases). Depending on the extent to which students are expected to write their own code to perform the data analysis, the general model adopted in this work can be adapted for a variety of purposes, from short (one to two day) practical exercises and workshops, to much longer independent student projects.
引用
收藏
页码:1343 / 1350
页数:8
相关论文
共 36 条
[1]   Identification of Edible Oils by Principal Component Analysis of 1H NMR Spectra [J].
Anderson, Shauna L. ;
Rovnyak, David ;
Strein, Timothy G. .
JOURNAL OF CHEMICAL EDUCATION, 2017, 94 (09) :1377-1382
[2]  
[Anonymous], SPECTR SPECTR DAT
[3]   Skills for Success: Student-Focused, Chemistry-Based, Skills-Developing, Open-Ended Project Work [J].
Burnham, Jennifer Ann Jean .
JOURNAL OF CHEMICAL EDUCATION, 2020, 97 (02) :344-350
[4]   Developing a skills-based practical chemistry programme: an integrated, spiral curriculum approach [J].
Campbell, Craig D. ;
Midson, Megan O. ;
Mann, Patrick E. Bergstrom ;
Cahill, Samuel T. ;
Green, Nicholas J. B. ;
Harris, Matthew T. ;
Hibble, Simon J. ;
O'Sullivan, Saskia K. E. ;
To, Trang ;
Rowlands, Lucy J. ;
Smallwood, Zoe M. ;
Vallance, Claire ;
Worrall, Andrew F. ;
Stewart, Malcolm, I .
CHEMISTRY TEACHER INTERNATIONAL, 2022, 4 (03) :243-257
[5]   Machine learning meets chemical physics [J].
Ceriotti, Michele ;
Clementi, Cecilia ;
Anatole von Lilienfeld, O. .
JOURNAL OF CHEMICAL PHYSICS, 2021, 154 (16)
[6]   Reflectance spectral analysis for novel characterization and clinical assessment of aspirated coronary thrombi in patients with ST elevation myocardial infarction [J].
De Maria, Giovanni Luigi ;
Lee, Regent ;
Alkhalil, Mohammad ;
Borlotti, Alessandra ;
Kotronias, Rafail ;
Langrish, Jeremy ;
Lucking, Andrew ;
Dawkins, Sam ;
Choudhury, Robin P. ;
Kharbanda, Rajesh ;
Banning, Adrian P. ;
Vallance, Claire ;
Channon, Keith M. .
PHYSIOLOGICAL MEASUREMENT, 2020, 41 (04)
[7]   Automated machine learning structure-composition- property relationships of perovskite materials for energy conversion and storage [J].
Deng, Qin ;
Lin, Bin .
ENERGY MATERIALS, 2021, 1 (01)
[8]   Electron density learning of non-covalent systems [J].
Fabrizio, Alberto ;
Grisafi, Andrea ;
Meyer, Benjamin ;
Ceriotti, Michele ;
Corminboeuf, Clemence .
CHEMICAL SCIENCE, 2019, 10 (41) :9424-9432
[9]  
FORGY EW, 1965, BIOMETRICS, V21, P768
[10]   Algorithmic Discovery of Tactical Combinations for Advanced Organic Syntheses [J].
Gajewska, Ewa P. ;
Szymkuc, Sara ;
Dittwald, Piotr ;
Startek, Michal ;
Popik, Oskar ;
Mlynarski, Jacek ;
Grzybowski, Bartosz A. .
CHEM, 2020, 6 (01) :280-293