Reaching optimized parameter set: protein secondary structure prediction using neural network

被引:12
作者
Dongardive, Jyotshna [1 ]
Abraham, Siby [2 ]
机构
[1] Univ Mumbai, Dept Comp Sci, Bombay, Maharashtra, India
[2] Univ Mumbai, Dept Math & Stat, GN Khalsa Coll, Bombay, Maharashtra, India
关键词
Multi-layer feed forward network; Learning algorithm; Proteins; Hidden neuron; Encoding scheme; Performance measures; Secondary structure prediction; SUPPORT VECTOR MACHINE; EVOLUTIONARY INFORMATION; MOLECULAR-STRUCTURE; BLOCKS DATABASE; AMINO-ACIDS; ALGORITHM; IMPROVEMENTS; RESOURCE; PROFILES; FAMILIES;
D O I
10.1007/s00521-015-2150-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose an optimized parameter set for protein secondary structure prediction using three-layer feed forward back propagation neural network. The methodology uses four parameters viz. encoding scheme, window size, number of neurons in the hidden layer and type of learning algorithm. The input layer of the network consists of neurons changing from 3 to 19, corresponding to different window sizes. The hidden layer chooses a natural number from 1 to 20 as the number of neurons. The output layer consists of three neurons, each corresponding to known secondary structural classes viz. alpha-helix, beta-strands and coil/turns, respectively. It also uses eight different learning algorithms and nine encoding schemes. Exhaustive experiments were performed using non-homologous dataset. The experimental results were compared using performance measures like Q(3), sensitivity, specificity, Mathew correlation coefficient and accuracy. The paper also discusses the process of obtaining a stabilized cluster of 2530 records from a collection of 11,340 records. The graphs of these stabilized clusters of records with respect to accuracy are concave, convergence is monotonic increasing and rate of convergence is uniform. The paper gives BLOSUM62 as the encoding scheme, 19 as the window size, 19 as the number of neurons in the hidden layer and one-step secant as the learning algorithm with the highest accuracy of 78 %. These parameter values are proposed as the optimized parameter set for the three-layer feed forward back propagation neural network for the protein secondary structure prediction.
引用
收藏
页码:1947 / 1974
页数:28
相关论文
共 137 条
[1]  
Agarwal S, 2013, INT J EMERG RES MANA, V2, P12
[2]  
Alirezaee M., 2012, INT J ARTIF INTEL AP, V3, P29
[3]  
[Anonymous], 1998, Mach Learn, DOI DOI 10.1023/A:1017181826899
[4]  
[Anonymous], INT J ENG COMP SCI
[5]   Protein sequence databases [J].
Apweiler, R ;
Bairoch, A ;
Wu, CH .
CURRENT OPINION IN CHEMICAL BIOLOGY, 2004, 8 (01) :76-80
[6]   The InterPro database, an integrated documentation resource for protein families, domains and functional sites [J].
Apweiler, R ;
Attwood, TK ;
Bairoch, A ;
Bateman, A ;
Birney, E ;
Biswas, M ;
Bucher, P ;
Cerutti, T ;
Corpet, F ;
Croning, MDR ;
Durbin, R ;
Falquet, L ;
Fleischmann, W ;
Gouzy, J ;
Hermjakob, H ;
Hulo, N ;
Jonassen, I ;
Kahn, D ;
Kanapin, A ;
Karavidopoulou, Y ;
Lopez, R ;
Marx, B ;
Mulder, NJ ;
Oinn, TM ;
Pagni, M ;
Servant, F ;
Sigrist, CJA ;
Zdobnov, EM .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :37-40
[7]   The Universal Protein Resource (UniProt) in 2010 [J].
Apweiler, Rolf ;
Martin, Maria Jesus ;
O'Donovan, Claire ;
Magrane, Michele ;
Alam-Faruque, Yasmin ;
Antunes, Ricardo ;
Barrell, Daniel ;
Bely, Benoit ;
Bingley, Mark ;
Binns, David ;
Bower, Lawrence ;
Browne, Paul ;
Chan, Wei Mun ;
Dimmer, Emily ;
Eberhardt, Ruth ;
Fedotov, Alexander ;
Foulger, Rebecca ;
Garavelli, John ;
Huntley, Rachael ;
Jacobsen, Julius ;
Kleen, Michael ;
Laiho, Kati ;
Leinonen, Rasko ;
Legge, Duncan ;
Lin, Quan ;
Liu, Wudong ;
Luo, Jie ;
Orchard, Sandra ;
Patient, Samuel ;
Poggioli, Diego ;
Pruess, Manuela ;
Corbett, Matt ;
di Martino, Giuseppe ;
Donnelly, Mike ;
van Rensburg, Pieter ;
Bairoch, Amos ;
Bougueleret, Lydie ;
Xenarios, Ioannis ;
Altairac, Severine ;
Auchincloss, Andrea ;
Argoud-Puy, Ghislaine ;
Axelsen, Kristian ;
Baratin, Delphine ;
Blatter, Marie-Claude ;
Boeckmann, Brigitte ;
Bolleman, Jerven ;
Bollondi, Laurent ;
Boutet, Emmanuel ;
Quintaje, Silvia Braconi ;
Breuza, Lionel .
NUCLEIC ACIDS RESEARCH, 2010, 38 :D142-D148
[8]   A hybrid genetic-neural system for predicting protein secondary structure [J].
Armano, G ;
Mancosu, G ;
Milanesi, L ;
Orro, A ;
Saba, M ;
Vargiu, E .
BMC BIOINFORMATICS, 2005, 6 (Suppl 4)
[9]   PRINTS-S: the database formerly known as PRINTS [J].
Attwood, TK ;
Croning, MDR ;
Flower, DR ;
Lewis, AP ;
Mabey, JE ;
Scordis, P ;
Selley, JN ;
Wright, W .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :225-227
[10]   PRINTS prepares for the new millennium [J].
Attwood, TK ;
Flower, DR ;
Lewis, AP ;
Mabey, JE ;
Morgan, SR ;
Scordis, P ;
Selley, JN ;
Wright, W .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :220-225