Probabilistic and Machine Learning Models for the Protein Scaffold Gap Filling Problem

被引:0
作者
Badal, Kushal [1 ]
Qingge, Letu [1 ]
Liu, Xiaowen [2 ]
Zhu, Binhai [3 ]
机构
[1] North Carolina A&T State Univ, Dept Comp Sci, Greensboro, NC 27411 USA
[2] Tulane Univ, John W Deming Dept Med, New Orleans, LA USA
[3] Montana State Univ, Gianforte Sch Comp, Bozeman, MT USA
来源
BIOINFORMATICS RESEARCH AND APPLICATIONS, PT III, ISBRA 2024 | 2024年 / 14956卷
关键词
Protein sequencing; Protein Scaffold filling; Machine learning; Probablistic model; Heuristic algorithms;
D O I
10.1007/978-981-97-5087-0_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In de novo protein sequencing, we often could only obtain an incomplete protein sequence, namely scaffold, from top-down and bottom-up tandem mass spectrometry. While most sections of the proteins can be inferred from its homologous sequences, some specific section of proteins is always missing and it is hard to predict the missing amino acids in the gaps of the scaffold. Thus, we only focus on predicting the gaps based on a probabilistic algorithm and machine learning models instead predicting the complete protein sequence using generative AI models in this paper. We study two versions of the protein scaffold filling problem with known size gaps and known mass gaps. For the known size gaps version, we develop several machine learning models based on random forest, k-nearest neighbors, decision tree and fully connected neural network. For the known mass gap problem, we design a probabilistic algorithm to predict the missing amino acids in the gaps. The experimental results on both real and simulation data show that our proposed algorithms show promising results of 100% and close to 100% accuracy.
引用
收藏
页码:28 / 39
页数:12
相关论文
共 12 条
[1]   Mass spectrometry-based proteomics [J].
Aebersold, R ;
Mann, M .
NATURE, 2003, 422 (6928) :198-207
[2]  
BRICAS E., 1965, BIOCHEMISTRY, V4, P2254, DOI 10.1021/bi00886a044
[3]   De Novo Sequencing of Antibody Light Chain Proteoforms from Patients with Multiple Myeloma [J].
Dupre, Mathieu ;
Duchateau, Magalie ;
Sternke-Hoffmann, Rebecca ;
Boquoi, Amelie ;
Malosse, Christian ;
Fenk, Roland ;
Haas, Rainer ;
Buell, Alexander K. ;
Rey, Martial ;
Chamot-Rooke, Julia .
ANALYTICAL CHEMISTRY, 2021, 93 (30) :10627-10634
[4]  
Kinter M., 2005, Protein Sequencing and Identification Using Tandem Mass Spectrometry
[5]   De Novo Protein Sequencing by Combining Top-Down and Bottom-Up Tandem Mass Spectra [J].
Liu, Xiaowen ;
Dekker, Lennard J. M. ;
Wu, Si ;
Vanduijn, Martijn M. ;
Luider, Theo M. ;
Tolic, Nikola ;
Kou, Qiang ;
Dvorkin, Mikhail ;
Alexandrova, Sonya ;
Vyatkina, Kira ;
Pasa-Tolic, Ljiljana ;
Pevzner, Pavel A. .
JOURNAL OF PROTEOME RESEARCH, 2014, 13 (07) :3241-3248
[6]  
National Center for Biotechnology Information, 2023, BLAST
[7]   Complete De Novo Assembly of Monoclonal Antibody Sequences [J].
Ngoc Hieu Tran ;
Rahman, M. Ziaur ;
He, Lin ;
Xin, Lei ;
Shan, Baozhen ;
Li, Ming .
SCIENTIFIC REPORTS, 2016, 6
[8]   Filling a Protein Scaffold With a Reference [J].
Qingge, Letu ;
Liu, Xiaowen ;
Zhong, Farong ;
Zhu, Binhai .
IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2017, 16 (02) :123-130
[9]   Peptide and protein de novo sequencing by mass spectrometry [J].
Standing, KG .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 2003, 13 (05) :595-601
[10]   A Convolutional Denoising Autoencoder for Protein Scaffold Filling [J].
Sturtz, Jordan ;
Annan, Richard ;
Zhu, Binhai ;
Liu, Xiaowen ;
Qingge, Letu .
BIOINFORMATICS RESEARCH AND APPLICATIONS, ISBRA 2023, 2023, 14248 :518-529