An Evolutionary-based Approach for Feature Generation: Eukaryotic Promoter Recognition

被引:0
作者
Kamath, Uday [1 ]
De Jong, Kenneth A. [1 ]
Shehu, Amarda [1 ]
机构
[1] George Mason Univ, Dept Comp Sci, Fairfax, VA 22030 USA
来源
2011 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC) | 2011年
关键词
Promoter Prediction; Evolutionary Algorithms; Support Vector Machines; FEATURE-SELECTION; GENETIC ALGORITHMS; PREDICTION; CLASSIFICATION; INFORMATION;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Prediction of promoter regions continues to be a challenging subproblem in mapping out eukaryotic DNA. While this task is key to understanding the regulation of differential transcription, the gene-specific architecture of promoter sequences does not readily lend itself to general strategies. To date, the best approaches are based on Support Vector Machines (SVMs) that employ standard "spectrum" features and achieve promoter region classification accuracies from a low of 84% to a high of 94% depending on the particular species involved. In this paper, we propose a general and powerful methodology that uses Genetic Programming (GP) techniques to generate more complex and more gene-specific features to be used with a standard SVM for promoter region identification. We evaluate our methodology on three data sets from different species and observe consistent classification accuracies in the 94-95% range. In addition, because the GP-generated features are gene-specific, they can be used by biologists to advance their understanding of the architecture of eukaryotic promoter regions.
引用
收藏
页码:277 / 284
页数:8
相关论文
共 46 条
  • [1] [Anonymous], 2014, C4. 5: programs for machine learning
  • [2] Pol II promoter prediction using characteristic 4-mer motifs: a machine learning approach
    Anwar, Firoz
    Baker, Syed Murtuza
    Jabid, Taskeed
    Hasan, Md Mehedi
    Shoyaib, Mohammad
    Khan, Haseena
    Walshe, Ray
    [J]. BMC BIOINFORMATICS, 2008, 9 (1)
  • [3] Dragon Promoter Finder: recognition of vertebrate RNA polymerase II promoters
    Bajic, VB
    Seah, SH
    Chong, A
    Zhang, GL
    Koh, JLY
    Brusic, V
    [J]. BIOINFORMATICS, 2002, 18 (01) : 198 - 199
  • [4] Chen YW, 2006, STUD FUZZ SOFT COMP, V207, P315
  • [5] Finishing the euchromatic sequence of the human genome
    Collins, FS
    Lander, ES
    Rogers, J
    Waterston, RH
    [J]. NATURE, 2004, 431 (7011) : 931 - 945
  • [6] Cramer N.L., 1985, P 1 INT C GENETIC AL, P183
  • [7] Novel feature selection method for genetic programming using metabolomic 1H NMR data
    Davis, RA
    Charlton, AJ
    Oehlschlager, S
    Wilson, JC
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2006, 81 (01) : 50 - 59
  • [8] Dosin C.D., 1997, VOL COMPUT, V5, P1
  • [9] Fan RE, 2005, J MACH LEARN RES, V6, P1889
  • [10] Research on collaborative negotiation for e-commerce.
    Feng, YQ
    Lei, Y
    Li, Y
    Cao, RZ
    [J]. 2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 2085 - 2088