Application of Intelligent Techniques for Classification of Bacteria Using Protein Sequence-Derived Features

被引:5
作者
Banerjee, Amit Kumar [1 ]
Ravi, Vadlamani [2 ]
Murty, U. S. N. [1 ]
Sengupta, Neelava [1 ]
Karuna, Batepatti [1 ]
机构
[1] Indian Inst Chem Technol CSIR, Div Biol, Bioinformat Grp, Hyderabad, Andhra Pradesh, India
[2] Inst Dev & Res Banking Technol IDBRT, Hyderabad, Andhra Pradesh, India
关键词
Histidine kinase; Classification; Datamining; Physicochemical property; Support vector machine; Radial basis function; MACHINE-LEARNING APPROACH; HISTIDINE KINASE; PHYSICOCHEMICAL PROPERTIES;
D O I
10.1007/s12010-013-0268-1
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Standard molecular experimental methodologies and mathematical procedures often fail to answer many phylogeny and classification related issues. Modern artificial intelligent-based techniques, such as radial basis function, genetic algorithm, artificial neural network, and support vector machines are of ample potential in this regard. Reliance on a large number of essential parameters will aid in enhanced robustness, reliability, and better accuracy as opposed to single molecular parameter. This study was conducted with dataset of computed protein physicochemical properties belonging to 20 different bacterial genera. A total of 57 sequential and structural parameters derived from protein sequences were considered for the initial classification. Feature selection based techniques were employed to find out the most important features influencing the dataset. Various amino acids, hydrophobicity, relative sulfur percentage, and codon number were selected as important parameters during the study. Comparative analyses were performed applying RapidMiner data mining platform. Support vector machine proved to be the best method with maximum accuracy of more than 91 %.
引用
收藏
页码:1263 / 1281
页数:19
相关论文
共 32 条
[1]   The evolution of two-component systems in bacteria reveals different strategies for niche adaptation [J].
Alm, Eric ;
Huang, Katherine ;
Arkin, Adam .
PLOS COMPUTATIONAL BIOLOGY, 2006, 2 (11) :1329-1342
[2]  
Ames C., 2006, Int. Congr. Ser, V1288, P795
[3]  
Banerjee A.K., 2008, J PROTEOM BIOINFORM, V1, P77
[4]  
Banerjee A.K., 2008, ELECTRON J BIOL, V4, P27
[5]   TOWARDS CLASSIFYING ORGANISMS BASED ON THEIR PROTEIN PHYSICOCHEMICAL PROPERTIES USING COMPARATIVE INTELLIGENT TECHNIQUES [J].
Banerjee, Amit Kumar ;
Harikrishna, Nayanoori ;
Kumar, Jangam Vikram ;
Murty, Upadhyayula Suryanarayana .
APPLIED ARTIFICIAL INTELLIGENCE, 2011, 25 (05) :426-439
[6]  
Banerjee AK, 2010, INDIAN J BIOCHEM BIO, V47, P370
[7]  
Christianini N., 2000, INTRO SUPPORT VECTOR, P189
[8]   SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297
[9]   Annotation and retrieval of clinically relevant images [J].
Demner-Fushman, Dina ;
Antani, Sameer ;
Simpson, Matthew ;
Thoma, George R. .
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2009, 78 (12) :E59-E67
[10]  
Deschenes R. J., 1990, ANTIMICROBIAL AGENTS, V43, P1700