Re-Fraction: A Machine Learning Approach for Deterministic Identification of Protein Homologues and Splice Variants in Large-scale MS-based Proteomics

被引:1
作者
Yang, Pengyi [1 ,2 ,3 ]
Humphrey, Sean J. [1 ]
Fazakerley, Daniel J. [1 ]
Prior, Matthew J. [1 ]
Yang, Guang [1 ]
James, David E. [1 ]
Yang, Jean Yee-Hwa [3 ]
机构
[1] St Vincents Hosp, Garvan Inst Med Res, Diabet & Obes Program, Darlinghurst, NSW 2010, Australia
[2] Univ Sydney, Sch Informat Technol, Sydney, NSW 2006, Australia
[3] Univ Sydney, Sch Math & Stat, Sydney, NSW 2006, Australia
基金
澳大利亚研究理事会;
关键词
Proteomics; Machine learning; Protein Inference; Protein homologues; Splice variants; Isoforms; Mass spectrometry; MASS-SPECTROMETRY; STATISTICAL-MODEL; INFERENCE; ACCURACY;
D O I
10.1021/pr300072J
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
A key step in the analysis of mass spectrometry (MS)-based proteomics data is the inference of proteins from identified peptide sequences. Here we describe Re-Fraction, a novel machine learning algorithm that enhances deterministic protein identification. Re-Fraction utilizes several protein physical properties to assign proteins to expected protein fractions that comprise large-scale MS-based proteomics data. This information is then used to appropriately assign peptides to specific proteins. This approach is sensitive, highly specific, and computationally efficient. We provide algorithms and source code for the current version of Re-Fraction, which accepts output tables from the MaxQuant environment. Nevertheless, the principles behind Re-Fraction can be applied to other protein identification pipelines where data are generated from samples fractionated at the protein level. We demonstrate the utility of this approach through reanalysis of data from a previously published study and generate lists of proteins deterministically identified by Re-Fraction that were previously only identified as members of a protein group. We find that this approach is particularly useful in resolving protein groups composed of splice variants and homologues, which are frequently expressed in a cell- or tissue-specific manner and may have important biological consequences.
引用
收藏
页码:3035 / 3045
页数:11
相关论文
共 37 条
[1]   In-depth analysis of the adipocyte proteome by mass spectrometry and bioinformatics [J].
Adachi, Jun ;
Kumar, Chanchal ;
Zhang, Yanling ;
Mann, Matthias .
MOLECULAR & CELLULAR PROTEOMICS, 2007, 6 (07) :1257-1273
[2]   In vitro and in silico processes to identify differentially expressed proteins [J].
Allet, N ;
Barrillat, N ;
Baussant, T ;
Boiteau, C ;
Botti, P ;
Bougueleret, L ;
Budin, N ;
Canet, D ;
Carraud, S ;
Chiappe, D ;
Christmann, N ;
Colinge, J ;
Cusin, I ;
Dafflon, N ;
Depresle, B ;
Fasso, I ;
Frauchiger, P ;
Gaertner, H ;
Gleizes, A ;
Gonzalez-Couto, E ;
Jeandenans, C ;
Karmime, A ;
Kowall, T ;
Lagache, S ;
Mahé, E ;
Masselot, A ;
Mattou, H ;
Moniatte, M ;
Niknejad, A ;
Paolini, M ;
Perret, F ;
Pinaud, N ;
Ranno, F ;
Raimondi, S ;
Reffas, S ;
Regamey, PO ;
Rey, PA ;
Rodriguez-Tomé, P ;
Rose, K ;
Rossellat, G ;
Saudrais, C ;
Schmidt, C ;
Villain, M ;
Zwahlen, C .
PROTEOMICS, 2004, 4 (08) :2333-2351
[3]  
Arnold Konstantin, 2009, Journal of Structural and Functional Genomics, V10, P1, DOI 10.1007/s10969-008-9048-5
[4]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[5]   Andromeda: A Peptide Search Engine Integrated into the MaxQuant Environment [J].
Cox, Juergen ;
Neuhauser, Nadin ;
Michalski, Annette ;
Scheltema, Richard A. ;
Olsen, Jesper V. ;
Mann, Matthias .
JOURNAL OF PROTEOME RESEARCH, 2011, 10 (04) :1794-1805
[6]   MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification [J].
Cox, Juergen ;
Mann, Matthias .
NATURE BIOTECHNOLOGY, 2008, 26 (12) :1367-1372
[7]   Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry [J].
Elias, Joshua E. ;
Gygi, Steven P. .
NATURE METHODS, 2007, 4 (03) :207-214
[8]   AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE [J].
ENG, JK ;
MCCORMACK, AL ;
YATES, JR .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) :976-989
[9]   Using GOstats to test gene lists for GO term association [J].
Falcon, S. ;
Gentleman, R. .
BIOINFORMATICS, 2007, 23 (02) :257-258
[10]   ELECTROSPRAY IONIZATION FOR MASS-SPECTROMETRY OF LARGE BIOMOLECULES [J].
FENN, JB ;
MANN, M ;
MENG, CK ;
WONG, SF ;
WHITEHOUSE, CM .
SCIENCE, 1989, 246 (4926) :64-71