70ProPred: a predictor for discovering sigma70 promoters based on combining multiple features

被引:73
作者
He, Wenying [1 ]
Jia, Cangzhi [2 ]
Duan, Yucong [3 ]
Zou, Quan [1 ]
机构
[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin 300072, Peoples R China
[2] Dalian Maritime Univ, Dept Math, Dalian 116026, Peoples R China
[3] Hainan Univ, Coll Informat & Technol, Haikou 570228, Hainan, Peoples R China
关键词
sigma70; promoter; PSTNPSS; PseEIIP; SVM; SEQUENCE-BASED PREDICTOR; BI-PROFILE BAYES; RECOMBINATION SPOTS; FEATURE-EXTRACTION; ESCHERICHIA-COLI; K-TUPLE; SITES; PROTEINS; DISCRIMINATION; TRANSCRIPTION;
D O I
10.1186/s12918-018-0570-1
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Promoter is an important sequence regulation element, which is in charge of gene transcription initiation. In prokaryotes, sigma(70) promoters regulate the transcription of most genes. The promoter recognition has been a crucial part of gene structure recognition. It's also the core issue of constructing gene transcriptional regulation network. With the successfully completion of genome sequencing from an increasing number of microbe species, the accurate identification of sigma(70) promoter regions in DNA sequence is not easy. Results: In order to improve the prediction accuracy of sigma70 promoters in prokaryote, a promoter recognition model 70ProPred was established. In this work, two sequence-based features, including position-specific trinucleotide propensity based on single-stranded characteristic (PSTNPss) and electron-ion potential values for trinucleotides (PseEIIP), were assessed to build the best prediction model. It was found that 79 features of PSTNPSS combined with 64 features of PseEIIP obtained the best performance for sigma70 promoter identification, with a promising accuracy and the Matthews correlation coefficient (MCC) at 95.56% and 0.90, respectively. Conclusion: The jackknife tests showed that 70ProPred outperforms the existing sigma70 promoter prediction approaches in terms of accuracy and stability. Additionally, this approach can also be extended to predict promoters of other species. In order to facilitate experimental biologists, an online web server for the proposed method was established, which is freely available at http://server.malab.cn/70ProPred/.
引用
收藏
页数:9
相关论文
共 66 条
[1]   ProSOM:: core promoter prediction based on unsupervised clustering of DNA physical profiles [J].
Abeel, Thomas ;
Saeys, Yvan ;
Rouze, Pierre ;
Van de Peer, Yves .
BIOINFORMATICS, 2008, 24 (13) :I24-I31
[2]   Generic eukaryotic core promoter prediction using structural features of DNA [J].
Abeel, Thomas ;
Saeys, Yvan ;
Bonnet, Eric ;
Rouze, Pierre ;
Van de Peer, Yves .
GENOME RESEARCH, 2008, 18 (02) :310-323
[3]   The MEME Suite [J].
Bailey, Timothy L. ;
Johnson, James ;
Grant, Charles E. ;
Noble, William S. .
NUCLEIC ACIDS RESEARCH, 2015, 43 (W1) :W39-W49
[4]   QUANTITATIVE-ANALYSIS OF RIBOSOME BINDING-SITES IN ESCHERICHIA-COLI [J].
BARRICK, D ;
VILLANUEBA, K ;
CHILDS, J ;
KALIL, R ;
SCHNEIDER, TD ;
LAWRENCE, CE ;
GOLD, L ;
STORMO, GD .
NUCLEIC ACIDS RESEARCH, 1994, 22 (07) :1287-1295
[5]   Compilation and analysis of σ54-dependent promoter sequences [J].
Barrios, H ;
Valderrama, B ;
Morett, E .
NUCLEIC ACIDS RESEARCH, 1999, 27 (22) :4305-4313
[6]   σ54-Promoter Discrimination and Regulation by ppGpp and DksA [J].
Bernardo, Lisandro M. D. ;
Johansson, Linda U. M. ;
Skarfstad, Eleonore ;
Shingler, Victoria .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2009, 284 (02) :828-838
[7]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[8]   iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences [J].
Chen, Wei ;
Feng, Pengmian ;
Yang, Hui ;
Ding, Hui ;
Lin, Hao ;
Chou, Kuo-Chen .
ONCOTARGET, 2017, 8 (03) :4208-4217
[9]   Detecting N6-methyladenosine sites from RNA transcriptomes using ensemble Support Vector Machines [J].
Chen, Wei ;
Xing, Pengwei ;
Zou, Quan .
SCIENTIFIC REPORTS, 2017, 7
[10]   Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences [J].
Chen, Wei ;
Lin, Hao ;
Chou, Kuo-Chen .
MOLECULAR BIOSYSTEMS, 2015, 11 (10) :2620-2634