A comparison study on feature selection of DNA structural properties for promoter prediction

被引:31
作者
Gan, Yanglan [1 ]
Guan, Jihong [1 ]
Zhou, Shuigeng [2 ,3 ]
机构
[1] Tongji Univ, Dept Comp Sci & Technol, Shanghai 200092, Peoples R China
[2] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Shanghai 200433, Peoples R China
[3] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金;
关键词
TRANSCRIPTION START SITES; HUMAN GENOME; CORE PROMOTER; SEQUENCE; STABILITY; LOCATION; DATABASE; WIDE; CLASSIFICATION; IDENTIFICATION;
D O I
10.1186/1471-2105-13-4
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Promoter prediction is an integrant step for understanding gene regulation and annotating genomes. Traditional promoter analysis is mainly based on sequence compositional features. Recently, many kinds of structural features have been employed in promoter prediction. However, considering the high-dimensionality and overfitting problems, it is unfeasible to utilize all available features for promoter prediction. Thus it is necessary to choose some appropriate features for the prediction task. Results: This paper conducts an extensive comparison study on feature selection of DNA structural properties for promoter prediction. Firstly, to examine whether promoters possess some special structures, we carry out a systematical comparison among the profiles of thirteen structural features on promoter and non-promoter sequences. Secondly, we investigate the correlations between these structural features and promoter sequences. Thirdly, both filter and wrapper methods are utilized to select appropriate feature subsets from thirteen different kinds of structural features for promoter prediction, and the predictive power of the selected feature subsets is evaluated. Finally, we compare the prediction performance of the feature subsets selected in this paper with nine existing promoter prediction approaches. Conclusions: Experimental results show that the structural features are differentially correlated to promoters. Specifically, DNA-bending stiffness, DNA denaturation and energy-related features are highly correlated with promoters. The predictive power for promoter sequences differentiates greatly among different structural features. Selecting the relevant features can significantly improve the accuracy of promoter prediction.
引用
收藏
页数:12
相关论文
共 53 条
[1]   ProSOM:: core promoter prediction based on unsupervised clustering of DNA physical profiles [J].
Abeel, Thomas ;
Saeys, Yvan ;
Rouze, Pierre ;
Van de Peer, Yves .
BIOINFORMATICS, 2008, 24 (13) :I24-I31
[2]   Generic eukaryotic core promoter prediction using structural features of DNA [J].
Abeel, Thomas ;
Saeys, Yvan ;
Bonnet, Eric ;
Rouze, Pierre ;
Van de Peer, Yves .
GENOME RESEARCH, 2008, 18 (02) :310-323
[3]   Toward a gold standard for promoter prediction evaluation [J].
Abeel, Thomas ;
Van de Peer, Yves ;
Saeys, Yvan .
BIOINFORMATICS, 2009, 25 (12) :I313-I320
[4]  
[Anonymous], PLANT PHYSL
[5]   Promoter prediction analysis on the whole human genome [J].
Bajic, VB ;
Tan, SL ;
Suzuki, Y ;
Sugano, S .
NATURE BIOTECHNOLOGY, 2004, 22 (11) :1467-1473
[6]   Dragon Gene Start Finder: An advanced system for finding approximate locations of the start of gene transcriptional units [J].
Bajic, VB ;
Seah, SH .
GENOME RESEARCH, 2003, 13 (08) :1923-1929
[7]   Thermal stability of DNA [J].
Blake, RD ;
Delcourt, SG .
NUCLEIC ACIDS RESEARCH, 1998, 26 (14) :3323-3332
[8]   PREDICTING DNA DUPLEX STABILITY FROM THE BASE SEQUENCE [J].
BRESLAUER, KJ ;
FRANK, R ;
BLOCKER, H ;
MARKY, LA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1986, 83 (11) :3746-3750
[9]   TRINUCLEOTIDE MODELS FOR DNA BENDING PROPENSITY - COMPARISON OF MODELS BASED ON DNASEI DIGESTION AND NUCLEOSOME PACKAGING DATA [J].
BRUKNER, I ;
SANCHEZ, R ;
SUCK, D ;
PONGOR, S .
JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 1995, 13 (02) :309-317
[10]   Genome-wide analysis of mammalian promoter architecture and evolution [J].
Carninci, Piero ;
Sandelin, Albin ;
Lenhard, Boris ;
Katayama, Shintaro ;
Shimokawa, Kazuro ;
Ponjavic, Jasmina ;
Semple, Colin A. M. ;
Taylor, Martin S. ;
Engström, Par G. ;
Frith, Martin C. ;
Forrest, Alistair R. R. ;
Alkema, Wynand B. ;
Tan, Sin Lam ;
Plessy, Charles ;
Kodzius, Rimantas ;
Ravasi, Timothy ;
Kasukawa, Takeya ;
Fukuda, Shiro ;
Kanamori-Katayama, Mutsumi ;
Kitazume, Yayoi ;
Kawaji, Hideya ;
Kai, Chikatoshi ;
Nakamura, Mari ;
Konno, Hideaki ;
Nakano, Kenji ;
Mottagui-Tabar, Salim ;
Arner, Peter ;
Chesi, Alessandra ;
Gustincich, Stefano ;
Persichetti, Francesca ;
Suzuki, Harukazu ;
Grimmond, Sean M. ;
Wells, Christine A. ;
Orlando, Valerio ;
Wahlestedt, Claes ;
Liu, Edison T. ;
Harbers, Matthias ;
Kawai, Jun ;
Bajic, Vladimir B. ;
Hume, David A. ;
Hayashizaki, Yoshihide .
NATURE GENETICS, 2006, 38 (06) :626-635