iFeature: a Python']Python package and web server for features extraction and selection from protein and peptide sequences

被引:498
作者
Chen, Zhen [1 ]
Zhao, Pei [2 ]
Li, Fuyi [3 ,4 ]
Leier, Andre [5 ,6 ]
Marquez-Lago, Tatiana T. [5 ,6 ]
Wang, Yanan [7 ]
Webb, Geoffrey I. [8 ]
Smith, A. Ian [3 ,4 ]
Daly, Roger J. [3 ,4 ]
Chou, Kuo-Chen [9 ,10 ]
Song, Jiangning [3 ,4 ,8 ]
机构
[1] Qingdao Univ, Sch Basic Med Sci, 38 Dengzhou Rd, Qingdao 266021, Peoples R China
[2] CAAS, State Key Lab Cotton Biol, Inst Cotton Res, Anyang 455000, Peoples R China
[3] Monash Univ, Biomed Discovery Inst, Melbourne, Vic 3800, Australia
[4] Monash Univ, Dept Biochem & Mol Biol, Melbourne, Vic 3800, Australia
[5] Univ Alabama Birmingham, Sch Med, Dept Genet, Birmingham, AL USA
[6] Univ Alabama Birmingham, Sch Med, Dept Cell Dev & Integrat Biol, Birmingham, AL USA
[7] Shanghai Jiao Tong Univ, Inst Image Proc & Pattern Recognit, Shanghai, Peoples R China
[8] Monash Univ, Fac Informat Technol, Monash Ctr Data Sci, Melbourne, Vic 3800, Australia
[9] Gordon Life Sci Inst, Boston, MA 02478 USA
[10] Univ Elect Sci & Technol China, Sch Life Sci & Technol, Ctr Informat Biol, Chengdu 610054, Sichuan, Peoples R China
基金
美国国家卫生研究院; 英国医学研究理事会; 中国国家自然科学基金; 澳大利亚研究理事会;
关键词
AMINO-ACID-COMPOSITION; PHYSICOCHEMICAL FEATURES; SUBCELLULAR LOCATIONS; PREDICTION; PSEAAC; SITES; CLASSIFICATION; GENERATE; DATABASE; PROFEAT;
D O I
10.1093/bioinformatics/bty140
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Structural and physiochemical descriptors extracted from sequence data have been widely used to represent sequences and predict structural, functional, expression and interaction profiles of proteins and peptides as well as DNAs/RNAs. Here, we present iFeature, a versatile Python-based toolkit for generating various numerical feature representation schemes for both protein and peptide sequences. iFeature is capable of calculating and extracting a comprehensive spectrum of 18 major sequence encoding schemes that encompass 53 different types of feature descriptors. It also allows users to extract specific amino acid properties from the AAindex database. Furthermore, iFeature integrates 12 different types of commonly used feature clustering, selection and dimensionality reduction algorithms, greatly facilitating training, analysis and benchmarking of machine-learning models. The functionality of iFeature is made freely available via an online web server and a stand-alone toolkit.
引用
收藏
页码:2499 / 2502
页数:4
相关论文
共 37 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
[Anonymous], 1961, Adaptive Control Processes: a Guided Tour, DOI DOI 10.1515/9781400874668
[3]   Prediction of protease substrates using sequence and structure features [J].
Barkan, David T. ;
Hostetter, Daniel R. ;
Mahrus, Sami ;
Pieper, Ursula ;
Wells, James A. ;
Craik, Charles S. ;
Sali, Andrej .
BIOINFORMATICS, 2010, 26 (14) :1714-1722
[4]   Classification of nuclear receptors based on amino acid composition and dipeptide composition [J].
Bhasin, M ;
Raghava, GPS .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2004, 279 (22) :23262-23266
[5]   Rcpi: R/Bioconductor package to generate various descriptors of proteins, compounds and their interactions [J].
Cao, Dong-Sheng ;
Xiao, Nan ;
Xu, Qing-Song ;
Chen, Alex F. .
BIOINFORMATICS, 2015, 31 (02) :279-281
[6]   propy: a tool to generate various modes of Chou's PseAAC [J].
Cao, Dong-Sheng ;
Xu, Qing-Song ;
Liang, Yi-Zeng .
BIOINFORMATICS, 2013, 29 (07) :960-962
[7]   Incorporating key position and amino acid residue features to identify general and species-specific Ubiquitin conjugation sites [J].
Chen, Xiang ;
Qiu, Jian-Ding ;
Shi, Shao-Ping ;
Suo, Sheng-Bao ;
Huang, Shu-Yun ;
Liang, Ru-Ping .
BIOINFORMATICS, 2013, 29 (13) :1614-1622
[8]   hCKSAAP_UbSite: Improved prediction of human ubiquitination sites by exploiting amino acid pattern and properties [J].
Chen, Zhen ;
Zhou, Yuan ;
Song, Jiangning ;
Zhang, Ziding .
BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS, 2013, 1834 (08) :1461-1467
[9]   Prediction of protein subcellular locations by incorporating quasi-sequence-order effect [J].
Chou, KC .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2000, 278 (02) :477-483
[10]   Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes [J].
Chou, KC .
BIOINFORMATICS, 2005, 21 (01) :10-19