Feature extraction method for proteins based on Markov tripeptide by compressive sensing

被引:2
作者
Gao, C. F. [1 ,2 ]
Wu, X. Y. [1 ]
机构
[1] Jiangnan Univ, Sch Sci, Wuxi 214122, Peoples R China
[2] Wuxi Engn Res Ctr Biocomp, Wuxi 214122, Peoples R China
来源
BMC BIOINFORMATICS | 2018年 / 19卷
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Amino acid sequence; Proteins; Feature extraction; Compressive sensing; Markov transfer matrix; CLASSIFIER; IMAGES;
D O I
10.1186/s12859-018-2235-x
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: In order to capture the vital structural information of the original protein, the symbol sequence was transformed into the Markov frequency matrix according to the consecutive three residues throughout the chain. A three-dimensional sparse matrix sized 20 x 20 x 20 was obtained and expanded to one-dimensional vector. Then, an appropriate measurement matrix was selected for the vector to obtain a compressed feature set by random projection. Consequently, the new compressive sensing feature extraction technology was proposed. Results: Several indexes were analyzed on the cell membrane, cytoplasm, and nucleus dataset to detect the discrimination of the features. In comparison with the traditional methods of scale wavelet energy and amino acid components, the experimental results suggested the advantage and accuracy of the features by this new method. Conclusions: The new features extracted from this model could preserve the maximum information contained in the sequence and reflect the essential properties of the protein. Thus, it is an adequate and potential method in collecting and processing the protein sequence from a large sample size and high dimension.
引用
收藏
页数:10
相关论文
共 21 条
  • [1] Bioinformatics - Principles and potential of a new multidisciplinary tool
    Benton, D
    [J]. TRENDS IN BIOTECHNOLOGY, 1996, 14 (08) : 261 - 272
  • [2] BIAN Zhao-qi, 2000, Pattern recognition
  • [3] Robust uncertainty principles:: Exact signal reconstruction from highly incomplete frequency information
    Candès, EJ
    Romberg, J
    Tao, T
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2006, 52 (02) : 489 - 509
  • [4] Decoding by linear programming
    Candes, EJ
    Tao, T
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2005, 51 (12) : 4203 - 4215
  • [5] Candès EJ, 2008, IEEE SIGNAL PROC MAG, V25, P21, DOI 10.1109/MSP.2007.914731
  • [6] Stable signal recovery from incomplete and inaccurate measurements
    Candes, Emmanuel J.
    Romberg, Justin K.
    Tao, Terence
    [J]. COMMUNICATIONS ON PURE AND APPLIED MATHEMATICS, 2006, 59 (08) : 1207 - 1223
  • [7] Classification of Multicolor Fluorescence In Situ Hybridization (M-FISH) Images With Sparse Representation
    Cao, Hongbao
    Deng, Hong-Wen
    Li, Marilyn
    Wang, Yu-Ping
    [J]. IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2012, 11 (02) : 111 - 118
  • [8] ProtDec-LTR2.0: an improved method for protein remote homology detection by combining pseudo protein and supervised Learning to Rank
    Chen, Junjie
    Guo, Mingyue
    Li, Shumin
    Liu, Bin
    [J]. BIOINFORMATICS, 2017, 33 (21) : 3473 - 3476
  • [9] Sound source localization using compressive sensing-based feature extraction and spatial sparsity
    Dehkordi, Mehdi Banitalebi
    Abutalebi, Hamid Reza
    Taban, Mohammad Reza
    [J]. DIGITAL SIGNAL PROCESSING, 2013, 23 (04) : 1239 - 1246
  • [10] Compressed sensing
    Donoho, DL
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2006, 52 (04) : 1289 - 1306