Pse-Analysis: a python']python package for DNA/RNA and protein/peptide sequence analysis based on pseudo components and kernel methods

被引:109
作者
Liu, Bin [1 ,2 ,3 ]
Wu, Hao [1 ]
Zhang, Deyuan [4 ]
Wang, Xiaolong [1 ,2 ]
Chou, Kuo-Chen [3 ,5 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen Grad Sch, Shenzhen, Guangdong, Peoples R China
[2] Harbin Inst Technol, Key Lab Network Oriented Intelligent Computat, Shenzhen Grad Sch, Shenzhen, Guangdong, Peoples R China
[3] Gordon Life Sci Inst, Boston, MA USA
[4] Shenyang Aerosp Univ, Sch Comp, Shenyang, Liaoning, Peoples R China
[5] Univ Elect Sci & Technol China, Ctr Informat Biol, Sch Life Sci & Technol, Key Lab Neuroinformat,Minist Educ, Chengdu, Peoples R China
基金
中国国家自然科学基金;
关键词
sequence analysis; pseudo components; support vector machine; genome/proteome analysis; AMINO-ACID-COMPOSITION; S-NITROSYLATION SITES; IDENTIFY RECOMBINATION SPOTS; FLEXIBLE WEB SERVER; K-TUPLE; ENSEMBLE CLASSIFIER; PHYSICOCHEMICAL PROPERTIES; N-6-METHYLADENOSINE SITES; TRANSLATION INITIATION; CELLULAR NETWORKING;
D O I
10.18632/oncotarget.14524
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
To expedite the pace in conducting genome/proteome analysis, we have developed a Python package called Pse-Analysis. The powerful package can automatically complete the following five procedures: (1) sample feature extraction, (2) optimal parameter selection, (3) model training, (4) cross validation, and (5) evaluating prediction quality. All the work a user needs to do is to input a benchmark dataset along with the query biological sequences concerned. Based on the benchmark dataset, Pse-Analysis will automatically construct an ideal predictor, followed by yielding the predicted results for the submitted query samples. All the aforementioned tedious jobs can be automatically done by the computer. Moreover, the multiprocessing technique was adopted to enhance computational speed by about 6 folds. The Pse-Analysis Python package is freely accessible to the public at http://bioinformatics.hitsz.edu.cn/Pse-Analysis/, and can be directly run on Windows, Linux, and Unix.
引用
收藏
页码:13338 / 13343
页数:6
相关论文
共 69 条
[1]  
[Anonymous], MOL INM
[2]  
[Anonymous], 2011, Acm T. Intel. Syst. Tec., DOI DOI 10.1145/1961189.1961199
[3]   propy: a tool to generate various modes of Chou's PseAAC [J].
Cao, Dong-Sheng ;
Xu, Qing-Song ;
Liang, Yi-Zeng .
BIOINFORMATICS, 2013, 29 (07) :960-962
[4]   iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences [J].
Chen, Wei ;
Feng, Pengmian ;
Yang, Hui ;
Ding, Hui ;
Lin, Hao ;
Chou, Kuo-Chen .
ONCOTARGET, 2017, 8 (03) :4208-4217
[5]   iRNA-PseU: Identifying RNA pseudouridine sites [J].
Chen, Wei ;
Tang, Hua ;
Ye, Jing ;
Lin, Hao ;
Chou, Kuo-Chen .
MOLECULAR THERAPY-NUCLEIC ACIDS, 2016, 5 :e332
[6]   IACP: a sequence-based tool for identifying anticancer peptides [J].
Chen, Wei ;
Ding, Hui ;
Feng, Pengmian ;
Lin, Hao ;
Chou, Kuo-Chen .
ONCOTARGET, 2016, 7 (13) :16895-16909
[7]   Using deformation energy to analyze nucleosome positioning in genomes [J].
Chen, Wei ;
Feng, Pengmian ;
Ding, Hui ;
Lin, Hao ;
Chou, Kuo-Chen .
GENOMICS, 2016, 107 (2-3) :69-75
[8]   iRNA-Methyl: Identifying N6-methyladenosine sites using pseudo nucleotide composition [J].
Chen, Wei ;
Feng, Pengmian ;
Ding, Hui ;
Lin, Hao ;
Chou, Kuo-Chen .
ANALYTICAL BIOCHEMISTRY, 2015, 490 :26-33
[9]   Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences [J].
Chen, Wei ;
Lin, Hao ;
Chou, Kuo-Chen .
MOLECULAR BIOSYSTEMS, 2015, 11 (10) :2620-2634
[10]   PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions [J].
Chen, Wei ;
Zhang, Xitong ;
Brooker, Jordan ;
Lin, Hao ;
Zhang, Liqing ;
Chou, Kuo-Chen .
BIOINFORMATICS, 2015, 31 (01) :119-+