Co-Clustering Analysis of Protein Secondary Structures

被引:3
作者
Ma, Lichun [1 ]
Wang, Debby D. [1 ]
Liu, Xinyu [1 ]
Zou, Bin [1 ]
Yan, Hong [1 ]
机构
[1] City Univ Hong Kong, Dept Elect Engn, 2384 Fong Yun Wah Bldg, Kowloon, Hong Kong, Peoples R China
关键词
Protein secondary structure; alpha-helix; beta-strand; co-clustering; clustering; STRUCTURE PREDICTION; BICLUSTERING ALGORITHMS; PORTER; MODEL;
D O I
10.2174/1574893612666170111145319
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The protein secondary structure provides a crucial link between a protein sequence and its final 3D structure. Thus, accurate prediction of protein secondary structure becomes very important. Objective: In this study, we try to obtain a subset of highly regular features of the protein secondary structures. Then these features can be used in the prediction of other chains' secondary structures. Method: The experiment data was obtained from the Dictionary of Protein Secondary Structure (DSSP), in which eight types of secondary structures are defined. We carried out statistical analysis of the amino acids for each type of secondary structure and then concentrated our attention on alpha-helix and beta-strand, the two most common regular secondary structures. The features of amino acids, neighbors, and hydrogen bonds (alpha-helix) were extracted. Then a co-clustering based method was conducted to analyze alpha-helix and beta-strand chain-feature matrices, respectively. Results and Conclusion: By using the features obtained from the co-clustering process, we are able to predict other chains' structures. The prediction performs well for beta-strands and long alpha-helices but poorly for short alpha-helices. Then, we further represented the features of each short alpha-helix by a vector. Afterwards, the prediction was made by comparing the testing vector and the training vectors in coclusters. Results show that the testing accuracy for short alpha-helices can reach 96% when using amino acid features as a vector. Therefore, the secondary structure of a protein sequence can be predicted with a high accuracy by using the co-clustering based method.
引用
收藏
页码:213 / 224
页数:12
相关论文
共 46 条
[1]   De novo protein crystal structure determination from X-ray free-electron laser data [J].
Barends, Thomas R. M. ;
Foucar, Lutz ;
Botha, Sabine ;
Doak, R. Bruce ;
Shoeman, Robert L. ;
Nass, Karol ;
Koglin, Jason E. ;
Williams, Garth J. ;
Boutet, Sebastien ;
Messerschmidt, Marc ;
Schlichting, Ilme .
NATURE, 2014, 505 (7482) :244-+
[2]   Discovering local structure in gene expression data: The order-preserving submatrix problem [J].
Ben-Dor, A ;
Chor, B ;
Karp, R ;
Yakhini, Z .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2003, 10 (3-4) :373-384
[3]   Iterative signature algorithm for the analysis of large-scale gene expression data [J].
Bergmann, S ;
Ihmels, J ;
Barkai, N .
PHYSICAL REVIEW E, 2003, 67 (03) :18
[4]   Protein Secondary Structure Prediction with SPARROW [J].
Bettella, Francesco ;
Rasinski, Dawid ;
Knapp, Ernst Walter .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2012, 52 (02) :545-556
[5]   Protein annotation and modelling servers at University College London [J].
Buchan, D. W. A. ;
Ward, S. M. ;
Lobley, A. E. ;
Nugent, T. C. O. ;
Bryson, K. ;
Jones, D. T. .
NUCLEIC ACIDS RESEARCH, 2010, 38 :W563-W568
[6]   Scalable web services for the PSIPRED Protein Analysis Workbench [J].
Buchan, Daniel W. A. ;
Minneci, Federico ;
Nugent, Tim C. O. ;
Bryson, Kevin ;
Jones, David T. .
NUCLEIC ACIDS RESEARCH, 2013, 41 (W1) :W349-W357
[7]   Biclustering in data mining [J].
Busygin, Stanislav ;
Prokopyev, Oleg ;
Pardalos, Panos M. .
COMPUTERS & OPERATIONS RESEARCH, 2008, 35 (09) :2964-2987
[8]   Prediction of Protein Secondary Structure Content by Using the Concept of Chou's Pseudo Amino Acid Composition and Support Vector Machine [J].
Chen, Chao ;
Chen, Lixuan ;
Zou, Xiaoyong ;
Cai, Peixiang .
PROTEIN AND PEPTIDE LETTERS, 2009, 16 (01) :27-31
[9]   Structure-based maximal affinity model predicts small-molecule druggability [J].
Cheng, Alan C. ;
Coleman, Ryan G. ;
Smyth, Kathleen T. ;
Cao, Qing ;
Soulard, Patricia ;
Caffrey, Daniel R. ;
Salzberg, Anna C. ;
Huang, Enoch S. .
NATURE BIOTECHNOLOGY, 2007, 25 (01) :71-75
[10]  
Cheng Y, 2000, Proc Int Conf Intell Syst Mol Biol, V8, P93