Role and results of statistical methods in protein fold class prediction

被引:20
作者
Edler, L
Grassmann, J
Suhai, S
机构
[1] German Canc Res Ctr, Biostat Unit R0700, D-69120 Heidelberg, Germany
[2] German Canc Res Ctr, Dept Mol Biophys, D-69120 Heidelberg, Germany
关键词
protein fold classes; regression; discrimination;
D O I
10.1016/S0895-7177(01)80022-4
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Statistical methods of discrimination and classification are used for the prediction of protein structure from amino acid sequence data. This provides information for the establishment of new paradigms of carcinogenesis modeling on the basis of gene expression. Feed forward neural networks and standard statistical classification procedures are used to classify proteins into fold classes. Logistic regression, additive models, and projection pursuit regression from the family of methods based on a posterior probabilities; linear, quadratic, and a flexible discriminant analysis from the class of methods based on class conditional probabilities, and the nearest-neighbors classification rule are applied to a data set of 268 sequences. From analyzing the prediction error obtained with a test sample (n = 125) and with a cross validation procedure, we conclude that the standard linear discriminant analysis and nearest-neighbor methods are at the same time statistically feasible and potent competitors to the more flexible tools of feed forward neural networks. Further research is needed to explore the gain obtainable from statistical methods by the application to larger sets of protein sequence data, and to compare the results with those from biophysical approaches. (C) 2001 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:1401 / 1417
页数:17
相关论文
共 28 条
[1]   KINETICS OF FORMATION OF NATIVE RIBONUCLEASE DURING OXIDATION OF REDUCED POLYPEPTIDE CHAIN [J].
ANFINSEN, CB ;
HABER, E ;
SELA, M ;
WHITE, FH .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1961, 47 (09) :1309-+
[2]  
EDLER L, 1999, IMS LECT NOTES MONOG, V33, P288
[3]  
Efron B., 1993, INTRO BOOTSTRAP, V1st ed., DOI DOI 10.1201/9780429246593
[4]   Protein structure: What is it possible to predict now? [J].
Finkelstein, AV .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1997, 7 (01) :60-71
[5]   REGULARIZED DISCRIMINANT-ANALYSIS [J].
FRIEDMAN, JH .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1989, 84 (405) :165-175
[6]   PROJECTION PURSUIT REGRESSION [J].
FRIEDMAN, JH ;
STUETZLE, W .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1981, 76 (376) :817-823
[7]  
Grassmann J, 1999, Proc Int Conf Intell Syst Mol Biol, P106
[8]  
GRASSMANN J, 1996, ADV STAT SOFTWARE, V51, P399
[9]  
GRASSMANN J, 1966, COMPSTAT, P277
[10]   FLEXIBLE DISCRIMINANT-ANALYSIS BY OPTIMAL SCORING [J].
HASTIE, T ;
TIBSHIRANI, R ;
BUJA, A .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1994, 89 (428) :1255-1270