EnvCNN: A Convolutional Neural Network Model for Evaluating Isotopic Envelopes in Top-Down Mass-Spectral Deconvolution

被引:11
作者
Basharat, Abdul Rehman [1 ]
Ning, Xia [3 ,4 ]
Liu, Xiaowen [1 ,2 ]
机构
[1] Indiana Univ Purdue Univ, Sch Informat & Comp, Indianapolis, IN 46202 USA
[2] Indiana Univ Sch Med, Ctr Computat Biol & Bioinformat, Indianapolis, IN 46202 USA
[3] Ohio State Univ, Dept Biomed Informat, Columbus, OH 43210 USA
[4] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
基金
美国国家卫生研究院;
关键词
OPEN-SOURCE SOFTWARE; PROTEOMICS; SEARCH; TOOL; IDENTIFICATION; PROTEOFORM; ACCURACY;
D O I
10.1021/acs.analchem.0c00903
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Top-down mass spectrometry has become the main method for intact proteoform identification, characterization, and quantitation. Because of the complexity of top-down mass spectrometry data, spectral deconvolution is an indispensable step in spectral data analysis, which groups spectral peaks into isotopic envelopes and extracts monoisotopic masses of precursor or fragment ions. The performance of spectral deconvolution methods relies heavily on their scoring functions, which distinguish correct envelopes from incorrect ones. A good scoring function increases the accuracy of deconvoluted masses reported from mass spectra. In this paper, we present EnvCNN, a convolutional neural network-based model for evaluating isotopic envelopes. We show that the model outperforms other scoring functions in distinguishing correct envelopes from incorrect ones and that it increases the number of identifications and improves the statistical significance of identifications in top-down spectral interpretation.
引用
收藏
页码:7778 / 7785
页数:8
相关论文
共 54 条
  • [1] Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
  • [2] Akosa J.S, 2017, SAS GLOBAL FORUM, V942, P1
  • [3] Mass spectrometry-based proteomics: existing capabilities and future directions
    Angel, Thomas E.
    Aryal, Uma K.
    Hengel, Shawna M.
    Baker, Erin S.
    Kelly, Ryan T.
    Robinson, Errol W.
    Smith, Richard D.
    [J]. CHEMICAL SOCIETY REVIEWS, 2012, 41 (10) : 3912 - 3928
  • [4] [Anonymous], PROC CVPR IEEE
  • [5] [Anonymous], 2019, ARXIV190408514
  • [6] [Anonymous], BMC BIOINFORMATIC S7
  • [7] SPECTRUM - A MATLAB Toolbox for Proteoform Identification from Top-Down Proteomics Data
    Basharat, Abdul Rehman
    Iman, Kanzal
    Khalid, Muhammad Farhan
    Anwar, Zohra
    Hussain, Rashid
    Kabir, Humnah Gohar
    Tahreem, Maria
    Shahid, Anam
    Humayun, Maheen
    Hayat, Hira Azmat
    Mustafa, Muhammad
    Shoaib, Muhammad Ali
    Ullah, Zakir
    Zarina, Shamshad
    Ahmed, Sameer
    Uddin, Emad
    Hamera, Sadia
    Ahmad, Fayyaz
    Chaudhary, Safee Ullah
    [J]. SCIENTIFIC REPORTS, 2019, 9 (1) : 11267
  • [8] YADA: a tool for taking the most out of high-resolution spectra
    Carvalho, Paulo C.
    Xu, Tao
    Han, Xuemei
    Cociorva, Daniel
    Barbosa, Valmir C.
    Yates, John R., III
    [J]. BIOINFORMATICS, 2009, 25 (20) : 2734 - 2736
  • [9] Top Down proteomics: Facts and perspectives
    Catherman, Adam D.
    Skinner, Owen S.
    Kelleher, Neil L.
    [J]. BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2014, 445 (04) : 683 - 693
  • [10] Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry
    Elias, Joshua E.
    Gygi, Steven P.
    [J]. NATURE METHODS, 2007, 4 (03) : 207 - 214