repDNA: a Python']Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects

被引:235
作者
Liu, Bin [1 ,2 ,3 ]
Liu, Fule [1 ]
Fang, Longyun [1 ]
Wang, Xiaolong [1 ,2 ]
Chou, Kuo-Chen [3 ,4 ]
机构
[1] Harbin Inst Technol, Shenzhen Grad Sch, Sch Comp Sci & Technol, Shenzhen 518055, Guangdong, Peoples R China
[2] Harbin Inst Technol, Shenzhen Grad Sch, Key Lab Network Oriented Intelligent Computat, Shenzhen 518055, Guangdong, Peoples R China
[3] Gordon Life Sci Inst, Belmont, MA 02478 USA
[4] King Abdulaziz Univ, Ctr Excellence Genom Med Res, Jeddah 21589, Saudi Arabia
基金
中国国家自然科学基金;
关键词
TUPLE NUCLEOTIDE COMPOSITION; AMINO-ACID-COMPOSITION; PSEUDO; PREDICTOR; PROMOTERS; PSEKNC; PSEAAC;
D O I
10.1093/bioinformatics/btu820
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
In order to develop powerful computational predictors for identifying the biological features or attributes of DNAs, one of the most challenging problems is to find a suitable approach to effectively represent the DNA sequences. To facilitate the studies of DNAs and nucleotides, we developed a Python package called representations of DNAs (repDNA) for generating the widely used features reflecting the physicochemical properties and sequence-order effects of DNAs and nucleotides. There are three feature groups composed of 15 features. The first group calculates three nucleic acid composition features describing the local sequence information by means of kmers; the second group calculates six autocorrelation features describing the level of correlation between two oligonucleotides along a DNA sequence in terms of their specific physicochemical properties; the third group calculates six pseudo nucleotide composition features, which can be used to represent a DNA sequence with a discrete model or vector yet still keep considerable sequence-order information via the physicochemical properties of its constituent oligonucleotides. In addition, these features can be easily calculated based on both the built-in and user-defined properties via using repDNA.
引用
收藏
页码:1307 / 1309
页数:3
相关论文
共 12 条
[1]   propy: a tool to generate various modes of Chou's PseAAC [J].
Cao, Dong-Sheng ;
Xu, Qing-Song ;
Liang, Yi-Zeng .
BIOINFORMATICS, 2013, 29 (07) :960-962
[2]   PseKNC: A flexible web server for generating pseudo K-tuple nucleotide composition [J].
Chen, Wei ;
Lei, Tian-Yu ;
Jin, Dian-Chuan ;
Lin, Hao ;
Chou, Kuo-Chen .
ANALYTICAL BIOCHEMISTRY, 2014, 456 :53-60
[3]   iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition [J].
Chen, Wei ;
Feng, Peng-Mian ;
Lin, Hao ;
Chou, Kuo-Chen .
NUCLEIC ACIDS RESEARCH, 2013, 41 (06) :e68
[4]   The organization of nucleosomes around splice sites [J].
Chen, Wei ;
Luo, Liaofu ;
Zhang, Lirong .
NUCLEIC ACIDS RESEARCH, 2010, 38 (09) :2788-2798
[5]   Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes [J].
Chou, KC .
BIOINFORMATICS, 2005, 21 (01) :10-19
[6]   Prediction of protein cellular attributes using pseudo-amino acid composition (vol 43, pg 246, 2001) [J].
Chou, KC .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 2001, 44 (01) :60-60
[7]   Prediction of protein cellular attributes using pseudo-amino acid composition [J].
Chou, KC .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 2001, 43 (03) :246-255
[8]   PseAAC-General: Fast Building Various Modes of General Form of Chou's Pseudo-Amino Acid Composition for Large-Scale Protein Datasets [J].
Du, Pufeng ;
Gu, Shuwang ;
Jiao, Yasen .
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2014, 15 (03) :3495-3506
[9]   iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition [J].
Guo, Shou-Hui ;
Deng, En-Ze ;
Xu, Li-Qin ;
Ding, Hui ;
Lin, Hao ;
Chen, Wei ;
Chou, Kuo-Chen .
BIOINFORMATICS, 2014, 30 (11) :1522-1529
[10]   iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition [J].
Lin, Hao ;
Deng, En-Ze ;
Ding, Hui ;
Chen, Wei ;
Chou, Kuo-Chen .
NUCLEIC ACIDS RESEARCH, 2014, 42 (21) :12961-12972