DeepRel: Deep learning-based gas chromatographic retention index predictor

被引:30
作者
Vrzal, Tomas [1 ]
Maleckova, Michaela [1 ,2 ]
Olsovska, Jana [1 ]
机构
[1] Res Inst Brewing & Malting Plc, Lipova 511-15, Prague 12044 2, Czech Republic
[2] Charles Univ Prague, Fac Sci, Dept Analyt Chem, Albertov 6, Prague 12843 2, Czech Republic
关键词
Artificial intelligence; Convolutional network; Deep learning; Gas chromatography; Retention index; MODELS; SYSTEM;
D O I
10.1016/j.aca.2020.12.043
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Retention index in gas chromatographic analyses is an essential tool for appropriate analyte identification. Currently, many libraries providing retention indices for a huge number of compounds on distinct stationary phase chemistries are available. However, situation could be complicated in the case of unknown unknowns not present in such libraries. The importance of identification of these compounds have risen together with a rapidly expanding interest in non-targeted analyses in the last decade. Therefore, precise in silico computation/prediction of retention indices based on a suggested molecular structure will be highly appreciated in such situations. On this basis, a predictive model based on deep learning was developed and presented in this paper. It is designed for user-friendly and accurate prediction of retention indices of compounds in gas chromatography with the semi-standard non-polar stationary phase. Simplified Molecular Input Entry System (SMILES) is used as the model's input. Architecture of the model consists of 2D-convolutional layers, together with batch normalization, max pooling, dropout, and three residual connections. The model reaches median absolute error of prediction of the retention index for validation and test set at 16.4 and 16.0 units, respectively. Median percentage error is lower than or equal to 0.81% in the case of all mentioned data sets. Finally, the DeepRel model is presented in R package, and is available on https://github.com/TomasVrzal/DeepRel together with a user-friendly graphical user interface. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页码:64 / 71
页数:8
相关论文
共 36 条
  • [1] Allaire J.J., 2020, Tensorflow: R Interface to'TensorFlow'
  • [2] Allaire Joseph J., 2020, R INTERFACE KERAS
  • [3] [Anonymous], 2017, ARXIV170307076
  • [4] [Anonymous], **DATA OBJECT**
  • [5] [Anonymous], **DATA OBJECT**
  • [6] [Anonymous], ADADELTA: An Adaptive Learning Rate Method
  • [7] Evaluation of a rapid method for the quantitative analysis of fatty acids in various matrices
    Araujo, Pedro
    Nguyen, Thu-Thao
    Froyland, Livar
    Wang, Jingdong
    Kang, Jing X.
    [J]. JOURNAL OF CHROMATOGRAPHY A, 2008, 1212 (1-2) : 106 - 113
  • [8] Development of a database of gas chromatographic retention properties of organic compounds
    Babushok, V. I.
    Linstrom, P. J.
    Reed, J. J.
    Zenkevich, I. G.
    Brown, R. L.
    Mallard, W. G.
    Stein, S. E.
    [J]. JOURNAL OF CHROMATOGRAPHY A, 2007, 1157 (1-2) : 414 - 421
  • [9] Retention Indices for Frequently Reported Compounds of Plant Essential Oils
    Babushok, V. I.
    Linstrom, P. J.
    Zenkevich, I. G.
    [J]. JOURNAL OF PHYSICAL AND CHEMICAL REFERENCE DATA, 2011, 40 (04)
  • [10] ChemmineR: a compound mining framework for R
    Cao, Yiqun
    Charisi, Anna
    Cheng, Li-Chang
    Jiang, Tao
    Girke, Thomas
    [J]. BIOINFORMATICS, 2008, 24 (15) : 1733 - 1734