Assessing the accuracy of prediction algorithms for classification: an overview

被引:1584
作者
Baldi, P [1 ]
Brunak, S
Chauvin, Y
Andersen, CAF
Nielsen, H
机构
[1] Univ Calif Irvine, Dept Informat & Comp Sci, Irvine, CA 92697 USA
[2] Tech Univ Denmark, Ctr Biol Sequence Anal, DK-2800 Lyngby, Denmark
[3] Net ID Inc, San Francisco, CA 94107 USA
[4] Univ Calif Irvine, Dept Biol Sci, Irvine, CA 92697 USA
关键词
D O I
10.1093/bioinformatics/16.5.412
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We provide a unified overview of methods that currently are widely used to assess the accuracy of prediction algorithms, from raw percentages, quadratic error measures and other distances, ann correlation coefficients, and to information theoretic measures such as relative entropy and mutual information. We briefly discuss the advantages and disadvantages of each approach. For classification tasks, we derive new learning algorithms for the design of prediction systems by directly optimising the correlation coefficient. We observe and prove several results relating sensitivity nod specificity of optimal systems. While the principles are general, we illustrate the applicability on specific problems such as protein secondary structure and signal peptide prediction.
引用
收藏
页码:412 / 424
页数:13
相关论文
共 25 条
  • [1] ANDERSEN CAF, 1998, THESIS TU DENMARK
  • [2] GRADIENT DESCENT LEARNING ALGORITHM OVERVIEW - A GENERAL DYNAMICAL-SYSTEMS PERSPECTIVE
    BALDI, P
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1995, 6 (01): : 182 - 195
  • [3] Baldi P., 1998, Bioinformatics: The machine learning approach
  • [4] Brunak S, 1996, PROTEINS, V25, P237, DOI 10.1002/(SICI)1097-0134(199606)25:2<237::AID-PROT9>3.3.CO
  • [5] 2-Y
  • [6] Evaluation of gene structure prediction programs
    Burset, M
    Guigo, R
    [J]. GENOMICS, 1996, 34 (03) : 353 - 367
  • [7] Chou P Y, 1978, Adv Enzymol Relat Areas Mol Biol, V47, P45
  • [8] EMPIRICAL PREDICTIONS OF PROTEIN CONFORMATION
    CHOU, PY
    FASMAN, GD
    [J]. ANNUAL REVIEW OF BIOCHEMISTRY, 1978, 47 : 251 - 276
  • [9] NetOglyc: Prediction of mucin type O-glycosylation sites based on sequence context and surface accessibility
    Hansen, JE
    Lund, O
    Tolstrup, N
    Gooley, AA
    Williams, KL
    Brunak, S
    [J]. GLYCOCONJUGATE JOURNAL, 1998, 15 (02) : 115 - 130
  • [10] Splice site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence information
    Hebsgaard, SM
    Korning, PG
    Tolstrup, N
    Engelbrecht, J
    Rouze, P
    Brunak, S
    [J]. NUCLEIC ACIDS RESEARCH, 1996, 24 (17) : 3439 - 3452