A comparative study of the predictive performance of different descriptor calculation tools: Molecular-based elution order modeling and interpretation of retention mechanism for isomeric compounds from METLIN database

被引:1
作者
Obradovic, Darija [1 ]
Stavrianidi, Andrey [2 ,3 ]
Fedorova, Elizaveta [3 ]
Bogojevic, Aleksandar [1 ]
Shpigun, Oleg [2 ]
Buryak, Aleksey [3 ]
Lazovic, Sasa [1 ]
机构
[1] Pregrevica 118, Belgrade 11080, Serbia
[2] Lomonosov Moscow State Univ, Chem Dept, 1-3 Leninskie Gory,GSP-1, Moscow 119991, Russia
[3] Russian Acad Sci, AN Frumkin Inst Phys Chem & Electrochem, 31 Leninsky Prospect, Moscow 119071, Russia
基金
俄罗斯科学基金会;
关键词
RP-HPLC; Rp retention mechanism; Molecular -based modeling; Elution order prediction; descriptor selection; LIQUID-CHROMATOGRAPHY; HPLC METHOD; IMPURITIES; SELECTION; SERIES; TIME;
D O I
10.1016/j.chroma.2024.464731
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
In the pharmaceutical industry, the need for analytical standards is a bottleneck for comprehensive evaluation and quality control of intermediate and end products. These are complex mixtures containing structurally related molecules. In this regard, chromatographic peak annotation, especially for critical pairs of isomers and closest structural analogs, can be supported by using a Quantitative Structure Retention Relationship (QSRR) approach. In our study, we investigated the fundamental basis of the reversed-phase (RP) retention mechanism for 1141 isomeric compounds from the METLIN SMRT dataset. Nine different descriptor calculation tools combined with different feature selection methods (genetic algorithm (GA), stepwise, Boruta) and machine learning (ML) approaches (support vector machine (SVM), multiple linear regression (MLR), random forest (RF), XGBoost) were applied to provide a reliable molecular structure-based interpretation of RP retention behaviour of the isomeric compounds. Strict internal and external validation metrics were used to select models with the best predictive capabilities (rtest > 0.73, order of elution > 60 %). For the developed models, mean absolute errors were in the range of 60 to 110 s. Stepwise and GA showed the most suitable performance as descriptor selection methods, while SVM and XGBoost modeling gave satisfactory predictive characteristics in most cases. Validation performed on the published experimental data for structurally related pharmaceutical compounds confirmed the best accuracy of MLR modeling in combination with GA feature selection of general physico-chemical properties. The resulting models will be useful for the prediction of separation and identification of structurally related compounds in pharmaceutical analysis, providing a simultaneous understanding of the interaction mechanisms leading to their retention under RP conditions.
引用
收藏
页数:10
相关论文
共 58 条
  • [2] Determination of sets of solute descriptors from chromatographic measurements
    Abraham, MH
    Ibrahim, A
    Zissimos, AM
    [J]. JOURNAL OF CHROMATOGRAPHY A, 2004, 1037 (1-2) : 29 - 47
  • [3] [Anonymous], 2003, Statistical Simulation and Inference in the Browser
  • [4] Chromatographic retention behaviour, modelling and optimization of a UHPLC-UV separation of the regioisomers of the Novel Psychoactive Substance (NPS) methoxphenidine (MXP)
    Boateng, Bernard O.
    Fever, Mark
    Edwards, Darren
    Petersson, Patrik
    Euerby, Melvin R.
    Sutcliffe, Oliver B.
    [J]. JOURNAL OF PHARMACEUTICAL AND BIOMEDICAL ANALYSIS, 2018, 153 : 238 - 247
  • [5] Retip: Retention Time Prediction for Compound Annotation in Untargeted Metabolomics
    Bonini, Paolo
    Kind, Tobias
    Tsugawa, Hiroshi
    Barupal, Dinesh Kumar
    Fiehn, Oliver
    [J]. ANALYTICAL CHEMISTRY, 2020, 92 (11) : 7515 - 7522
  • [6] PHARMACOPHORIC PATTERN-MATCHING IN FILES OF 3-DIMENSIONAL CHEMICAL STRUCTURES - IMPLEMENTATION OF FLEXIBLE SEARCHING
    CLARK, DE
    WILLETT, P
    KENNY, PW
    [J]. JOURNAL OF MOLECULAR GRAPHICS, 1993, 11 (03): : 146 - 156
  • [7] SwissADME: a free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules
    Daina, Antoine
    Michielin, Olivier
    Zoete, Vincent
    [J]. SCIENTIFIC REPORTS, 2017, 7
  • [8] Descriptors and their selection methods in QSAR analysis: paradigm for drug design
    Danishuddin
    Khan, Asad U.
    [J]. DRUG DISCOVERY TODAY, 2016, 21 (08) : 1291 - 1302
  • [9] The METLIN small molecule dataset for machine learning-based retention time prediction
    Domingo-Almenara, Xavier
    Guijas, Carlos
    Billings, Elizabeth
    Montenegro-Burke, J. Rafael
    Uritboonthai, Winnie
    Aisporna, Aries E.
    Chen, Emily
    Benton, H. Paul
    Siuzdak, Gary
    [J]. NATURE COMMUNICATIONS, 2019, 10 (1)
  • [10] ChemSAR: an online pipelining platform for molecular SAR modeling
    Dong, Jie
    Yao, Zhi-Jiang
    Zhu, Min-Feng
    Wang, Ning-Ning
    Lu, Ben
    Chen, Alex F.
    Lu, Ai-Ping
    Miao, Hongyu
    Zeng, Wen-Bin
    Cao, Dong-Sheng
    [J]. JOURNAL OF CHEMINFORMATICS, 2017, 9