A novel approach to estimation of E-coli promoter gene sequences:: Combining feature selection and least square support vector machine (FS_LSSVM)

被引:20
作者
Polat, Kemal [1 ]
Guenes, Salih [1 ]
机构
[1] Selcuk Univ, Dept Elect & Elect Engn, TR-42075 Konya, Turkey
关键词
E. coli promoter gene sequences; feature selection; LSSVM classifier; estimation;
D O I
10.1016/j.amc.2007.02.033
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In this paper, we have investigated the real-world task of recognizing biological concepts in DNA sequences. Recognizing promoters in strings that represent nucleotides (one of A, G, T, or C) has been performed using a novel approach based on combining feature selection (FS) and least square support vector machine (LSSVM). Dimensionality of Escherichia coli promoter gene sequences dataset has 57 attributes and 106 samples including 53 promoters and 53 non-promoters. The proposed system consists of two parts. Firstly, we have used the FS process to reduce the dimensionality of E. coli promoter gene sequences dataset that has 57 attributes. So the dimensionality of this dataset has been reduced to 4 attributes by means of FS process. Secondly, LSSVM classifier algorithm has been run to estimation the E. coli promoter gene sequences. In order to show the performance of the proposed system, we have used the success rate, sensitivity and specificity analysis, 10-fold cross validation, and confusion matrix. Whilst only LSSVM classifier has been obtained 80% success rate using 10-fold cross validation, the proposed system has been obtained 100% success rate for same condition. These obtained results indicate that the proposed approach improve the success rate in recognizing promoters in strings that represent nucleotides. (C) 2007 Elsevier Inc. All rights reserved.
引用
收藏
页码:1574 / 1582
页数:9
相关论文
共 28 条
[1]   ESSENTIAL STRUCTURE OF ESCHERICHIA-COLI PROMOTER - EFFECT OF SPACER LENGTH BETWEEN THE 2 CONSENSUS SEQUENCES ON PROMOTER FUNCTION [J].
AOYAMA, T ;
TAKANAMI, M ;
OHTSUKA, E ;
TANIYAMA, Y ;
MARUMOTO, R ;
SATO, H ;
IKEHARA, M .
NUCLEIC ACIDS RESEARCH, 1983, 11 (17) :5855-5864
[2]   PROMOTERS OF ESCHERICHIA-COLI - A HIERARCHY OF INVIVO STRENGTH INDICATES ALTERNATE STRUCTURES [J].
DEUSCHLE, U ;
KAMMERER, W ;
GENTZ, R ;
BUJARD, H .
EMBO JOURNAL, 1986, 5 (11) :2987-2994
[3]  
GEOFFREY GT, 1990, P 8 NAT C ART INT, P861
[4]   ANALYSIS OF ESCHERICHIA-COLI PROMOTER SEQUENCES [J].
HARLEY, CB ;
REYNOLDS, RP .
NUCLEIC ACIDS RESEARCH, 1987, 15 (05) :2343-2361
[5]   SEARCH ALGORITHM FOR PATTERN MATCH ANALYSIS OF NUCLEIC-ACID SEQUENCES [J].
HARR, R ;
HAGGSTROM, M ;
GUSTAFSSON, P .
NUCLEIC ACIDS RESEARCH, 1983, 11 (09) :2943-2957
[6]   COMPILATION AND ANALYSIS OF ESCHERICHIA-COLI PROMOTER DNA-SEQUENCES [J].
HAWLEY, DK ;
MCCLURE, WR .
NUCLEIC ACIDS RESEARCH, 1983, 11 (08) :2237-2255
[7]   FUNCTIONAL DISSECTION OF ESCHERICHIA-COLI PROMOTERS - INFORMATION IN THE TRANSCRIBED REGION IS INVOLVED IN LATE STEPS OF THE OVERALL PROCESS [J].
KAMMERER, W ;
DEUSCHLE, U ;
GENTZ, R ;
BUJARD, H .
EMBO JOURNAL, 1986, 5 (11) :2995-3000
[8]  
Kohavi R., 1998, GLOSSARY TERMS EDITO
[9]   A LAC PROMOTER WITH A CHANGED DISTANCE BETWEEN -10-REGION AND -35-REGION [J].
MANDECKI, W ;
REZNIKOFF, WS .
NUCLEIC ACIDS RESEARCH, 1982, 10 (03) :903-912
[10]  
MCCLURE WR, 1985, ANNU REV BIOCHEM, V54, P171, DOI 10.1146/annurev.bi.54.070185.001131