Genomic Similarity and Kernel Methods II: Methods for Genomic Information

被引:59
作者
Schaid, Daniel J. [1 ]
机构
[1] Mayo Clin, Div Biomed Stat & Informat, Rochester, MN 55905 USA
关键词
Genomic pathways; Kernel; Networks; SMOOTHING SPLINE ANOVA; GENE; ASSOCIATION; POWER;
D O I
10.1159/000312643
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Measures of genomic similarity are often the basis of flexible statistical analyses, and when based on kernel methods, they provide a powerful platform to take advantage of a broad and deep statistical theory, and a wide range of existing software; see the companion paper for a review of this material [1]. The kernel method converts information - perhaps complex or high-dimensional information - for a pair of subjects to a quantitative value representing either similarity or dissimilarity, with the requirement that it must create a positive semidefinite matrix when applied to all pairs of subjects. This approach provides enormous opportunities to enhance genetic analyses by including a wide range of publically-available data as structured kernel 'prior' information. Kernel methods are appealing for their generality, yet this generality can make it challenging to formulate measures of similarity that directly address a specific scientific aim, or that are most powerful to detect a specific genetic mechanism. Although it is difficult to create a cook book of kernels for genetic studies, useful guidelines can be gleaned from a variety of novel published approaches. We review some novel developments of kernels for specific analyses and speculate on how to build kernels for complex genomic attributes based on publically available data. The creativity of analysts, with rigorous evaluations by applications to real and simulated data, will ultimately provide a much stronger array of kernel 'tools' for genetic analyses. Copyright (C) 2010 S. Karger AG, Basel
引用
收藏
页码:132 / 140
页数:9
相关论文
共 35 条
  • [1] [Anonymous], 2003, Semiparametric Regression
  • [2] [Anonymous], 2000, INTRO SUPPORT VECTOR, DOI DOI 10.1017/CBO9780511801389
  • [3] Modeling splicing sites with pairwise correlations
    Arita, M
    Tsuda, K
    Asai, K
    [J]. BIOINFORMATICS, 2002, 18 : S27 - S34
  • [4] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [5] Training invariant support vector machines
    Decoste, D
    Schölkopf, B
    [J]. MACHINE LEARNING, 2002, 46 (1-3) : 161 - 190
  • [6] Everitt B. S., 2001, CLUSTER ANAL
  • [7] A probabilistic view of gene function
    Fraser, AG
    Marcotte, EM
    [J]. NATURE GENETICS, 2004, 36 (06) : 559 - 564
  • [8] Sparse inverse covariance estimation with the graphical lasso
    Friedman, Jerome
    Hastie, Trevor
    Tibshirani, Robert
    [J]. BIOSTATISTICS, 2008, 9 (03) : 432 - 441
  • [9] What is a gene, post-ENCODE? History and updated definition
    Gerstein, Mark B.
    Bruce, Can
    Rozowsky, Joel S.
    Zheng, Deyou
    Du, Jiang
    Korbel, Jan O.
    Emanuelsson, Olof
    Zhang, Zhengdong D.
    Weissman, Sherman
    Snyder, Michael
    [J]. GENOME RESEARCH, 2007, 17 (06) : 669 - 681
  • [10] METRIC AND EUCLIDEAN PROPERTIES OF DISSIMILARITY COEFFICIENTS
    GOWER, JC
    LEGENDRE, P
    [J]. JOURNAL OF CLASSIFICATION, 1986, 3 (01) : 5 - 48