Predicting liquid chromatographic retention times of peptides from the Drosophila melanogaster proteome by machine learning approaches

被引:36
作者
Tian, Feifei [1 ]
Yang, Li [1 ]
Lv, Fenglin [1 ]
Zhou, Peng [2 ]
机构
[1] Chongqing Univ, Coll Bioengn, Chongqing 400044, Peoples R China
[2] Zhejiang Univ, Dept Chem, Hangzhou 310027, Peoples R China
关键词
Least-squares support vector machine; Random forest; Gaussian process; Peptide; Liquid chromatography; Quantitative structure-retention relationship; PARTIAL LEAST-SQUARES; ARTIFICIAL NEURAL-NETWORKS; ESCHERICHIA-COLI PROTEOME; SUPPORT VECTOR MACHINE; CARLO CROSS-VALIDATION; QUANTITATIVE PREDICTION; PROTEASE DIGESTION; GAUSSIAN-PROCESSES; REGRESSION-MODELS; MS;
D O I
10.1016/j.aca.2009.04.010
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Three machine learning algorithms as least-squares support vector machine (LSSVM), random forest (RF) and Gaussian process (GP) were used to model the quantitative structure-retention relationship (QSRR) for predicting and explaining the retention behavior of proteome-wide peptides in the reverse-phase liquid chromatography. Peptides were parameterized using CODESSA approach and 145 descriptors were obtained for each peptide, including diverse Structural information such as constitutional, topological, geometrical and physicochemical property. Based upon that, the nonlinear LSSVM, RF and GP as well as another sophisticated linear method (partial least-squares regression (PLS)) were employed in the QSRR model development. By a series of systematic validations as internal cross-validation, external test and Monte Carlo cross-validation. the stability and predictive power of the constructed models were confirmed. Results show that regression models developed using nonlinear approaches such as LSSVM, RF and GP predict better than linear PLS models. Considering the retention times used in this work were measured in different columns and thus have a relatively large uncertainty (reproducibility within 7%), the optimal statistics obtained from GP modeling are satisfactory, with the coefficients of determination (R-2) for training set and test set of 0.894 and 0.866, respectively. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:10 / 16
页数:7
相关论文
共 9 条
  • [1] Prediction of liquid chromatographic retention times of peptides generated by protease digestion of the Escherichia coli proteome using artificial neural networks
    Shinoda, Kosaku
    Sugimoto, Masahiro
    Yachie, Nozomu
    Sugiyama, Naoyuki
    Masuda, Takeshi
    Robert, Martin
    Soga, Tomoyoshi
    Tomita, Masaru
    JOURNAL OF PROTEOME RESEARCH, 2006, 5 (12) : 3312 - 3317
  • [2] Comprehensive comparison of eight statistical modelling methods used in quantitative structure-retention relationship studies for liquid chromatographic retention times of peptides generated by protease digestion of the Escherichia coli proteome
    Zhou, Peng
    Tian, Feifei
    Lv, Fenglin
    Shang, Zhicai
    JOURNAL OF CHROMATOGRAPHY A, 2009, 1216 (15) : 3107 - 3116
  • [3] Machine learning for predicting retention times of chiral analytes chromatographically separated by CMPA technique
    Liu, Xiong
    Zhang, He
    Zhou, Wei
    Zhou, Yuying
    Zhang, Yuexin
    Cao, Xiaoliang
    Liu, Muqing
    Peng, Yingzi
    JOURNAL OF CHROMATOGRAPHY A, 2025, 1749
  • [4] Insights into predicting small molecule retention times in liquid chromatography using deep learning
    Liu, Yuting
    Yoshizawa, Akiyasu C.
    Ling, Yiwei
    Okuda, Shujiro
    JOURNAL OF CHEMINFORMATICS, 2024, 16 (01):
  • [5] A New Combination Strategy as Applied in Predicting Chromatographic Retention Times of Oligonucleotides at a Range of Temperatures from 30 °C to 80 °C
    Liang, Gui-Zhao
    Ma, Xiu-Yan
    Chen, Yu-Zhen
    Li, Yuan-Chao
    Lv, Feng-Li
    Yang, Li
    JOURNAL OF THE CHINESE CHEMICAL SOCIETY, 2011, 58 (01) : 75 - 82
  • [6] Improved workflow for constructing machine learning models: Predicting retention times and peak widths in oligonucleotide separation
    Samuelsson, Jorgen
    Enmark, Martin
    Szabados, Gergely
    Rahal, Manal
    Ahmed, Bestoun S.
    Haggstrom, Jakob
    Forssen, Patrik
    Fornstedt, Torgny
    JOURNAL OF CHROMATOGRAPHY A, 2025, 1747
  • [7] Comparing Deep Learning and Classical Machine Learning Approaches for Predicting Inpatient Violence Incidents from Clinical Text
    Menger, Vincent
    Scheepers, Floor
    Spruit, Marco
    APPLIED SCIENCES-BASEL, 2018, 8 (06):
  • [8] Predicting soil organic matter and soil moisture content from digital camera images: comparison of regression and machine learning approaches
    Taneja, Perry
    Vasava, Hiteshkumar Bhogilal
    Fathololoumi, Solmaz
    Daggupati, Prasad
    Biswas, Asim
    CANADIAN JOURNAL OF SOIL SCIENCE, 2022,
  • [9] Comparing Machine Learning Approaches for Predicting Spatially Explicit Life Cycle Global Warming and Eutrophication Impacts from Corn Production
    Romeiko, Xiaobo Xue
    Guo, Zhijian
    Pang, Yulei
    Lee, Eun Kyung
    Zhang, Xuesong
    SUSTAINABILITY, 2020, 12 (04)