EnhancerPred: a predictor for discovering enhancers based on the combination and selection of multiple features

被引:91
作者
Jia, Cangzhi [1 ]
He, Wenying [1 ]
机构
[1] Dalian Maritime Univ, Dept Math, 1 Linghai Rd, Dalian 116026, Peoples R China
关键词
SEQUENCE-BASED PREDICTOR; AMINO-ACID-COMPOSITION; TRANSCRIPTIONAL ENHANCERS; PSEUDO; SIGNATURES; PSEKNC; RNA; DNA;
D O I
10.1038/srep38741
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Enhancers are cis elements that play an important role in regulating gene expression by enhancing it. Recent study of modifications revealed that enhancers are a large group of functional elements with many different subgroups, which have different biological activities and regulatory effects on target genes. As powerful auxiliary tools, several computational methods have been proposed to distinguish enhancers from other regulatory elements, but only one method has been considered to clustering them into subgroups. In this study, we developed a predictor (called EnhancerPred) to distinguish between enhancers and nonenhancers and to determine enhancers' strength. A two-step wrapperbased feature selection method was applied in high dimension feature vector from bi-profile Bayes and pseudo-nucleotide composition. Finally, the combination of 104 features from bi-profile Bayes, 1 feature from nucleotide composition and 9 features from pseudo-nucleotide composition yielded the best performance for identifying enhancers and nonenhancers, with overall Acc of 77.39%. The combination of 89 features from bi-profile Bayes and 10 features from pseudo-nucleotide composition yielded the best performance for identifying strong and weak enhancers, with overall Acc of 68.19%. The process and steps of feature optimization illustrated that it is necessary to construct a particular model for identifying strong enhancers and weak enhancers.
引用
收藏
页数:7
相关论文
共 49 条
[1]   Tissue-specific analysis of chromatin state identifies temporal signatures of enhancer activity during embryonic development [J].
Bonn, Stefan ;
Zinzen, Robert P. ;
Girardot, Charles ;
Gustafson, E. Hilary ;
Perez-Gonzalez, Alexis ;
Delhomme, Nicolas ;
Ghavi-Helm, Yad ;
Wilczynski, Bartek ;
Riddell, Andrew ;
Furlong, Eileen E. M. .
NATURE GENETICS, 2012, 44 (02) :148-156
[2]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[3]   IACP: a sequence-based tool for identifying anticancer peptides [J].
Chen, Wei ;
Ding, Hui ;
Feng, Pengmian ;
Lin, Hao ;
Chou, Kuo-Chen .
ONCOTARGET, 2016, 7 (13) :16895-16909
[4]   Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences [J].
Chen, Wei ;
Lin, Hao ;
Chou, Kuo-Chen .
MOLECULAR BIOSYSTEMS, 2015, 11 (10) :2620-2634
[5]   iSS-PseDNC: Identifying Splicing Sites Using Pseudo Dinucleotide Composition [J].
Chen, Wei ;
Feng, Peng-Mian ;
Lin, Hao ;
Chou, Kuo-Chen .
BIOMED RESEARCH INTERNATIONAL, 2014, 2014
[6]   PseKNC: A flexible web server for generating pseudo K-tuple nucleotide composition [J].
Chen, Wei ;
Lei, Tian-Yu ;
Jin, Dian-Chuan ;
Lin, Hao ;
Chou, Kuo-Chen .
ANALYTICAL BIOCHEMISTRY, 2014, 456 :53-60
[7]   iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition [J].
Chen, Wei ;
Feng, Peng-Mian ;
Lin, Hao ;
Chou, Kuo-Chen .
NUCLEIC ACIDS RESEARCH, 2013, 41 (06) :e68
[8]   Recent progress in protein subcellular location prediction [J].
Chou, Kuo-Chen ;
Shen, Hong-Bin .
ANALYTICAL BIOCHEMISTRY, 2007, 370 (01) :1-16
[9]   Some remarks on protein attribute prediction and pseudo amino acid composition [J].
Chou, Kuo-Chen .
JOURNAL OF THEORETICAL BIOLOGY, 2011, 273 (01) :236-247
[10]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+