Gene set enrichment analysis using linear models and diagnostics

被引:38
作者
Oron, Assaf P. [1 ,2 ]
Jiang, Zhen [3 ]
Gentleman, Robert [1 ]
机构
[1] Fred Hutchinson Canc Res Ctr, Seattle, WA 98109 USA
[2] Univ Washington, Dept Stat, Seattle, WA 98195 USA
[3] Rosetta Inpharmat LLC, Seattle, WA 98109 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1093/bioinformatics/btn465
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Gene-set enrichment analysis (GSEA) can be greatly enhanced by linear model (regression) diagnostic techniques. Diagnostics can be used to identify outlying or influential samples, and also to evaluate model. fit and explore model expansion. Results: We demonstrate this methodology on an adult acute lymphoblastic leukemia (ALL) dataset, using GSEA based on chromosome-band mapping of genes. Individual residuals, grouped or aggregated by chromosomal loci, indicate problematic samples and potential data-entry errors, and help identify hyperdiploidy as a factor playing a key role in expression for this dataset. Subsequent analysis pinpoints suspected DNA copy number abnormalities of specific samples and chromosomes (most prevalent are chromosomes X, 21 and 14), and also reveals significant expression differences between the hyperdiploid and diploid groups on other chromosomes (most prominently 19, 22, 3 and 13)-differences which are apparently not associated with copy number.
引用
收藏
页码:2586 / 2591
页数:6
相关论文
共 23 条
  • [1] Benjamini Y, 2001, ANN STAT, V29, P1165
  • [2] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [3] The human transcriptome map:: Clustering of highly expressed genes in chromosomal domains
    Caron, H
    van Schaik, B
    van der Mee, M
    Baas, F
    Riggins, G
    van Sluis, P
    Hermus, MC
    van Asperen, R
    Boon, K
    Voûte, PA
    Heisterkamp, S
    van Kampen, A
    Versteeg, R
    [J]. SCIENCE, 2001, 291 (5507) : 1289 - +
  • [4] Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival
    Chiaretti, S
    Li, XC
    Gentleman, R
    Vitale, A
    Vignetti, M
    Mandelli, F
    Ritz, J
    Foa, R
    [J]. BLOOD, 2004, 103 (07) : 2771 - 2778
  • [5] Cook R. D., 1982, Residuals and influence in regression
  • [6] Correlation and large-scale simultaneous significance testing
    Efron, Bradley
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2007, 102 (477) : 93 - 103
  • [7] ERNST M, 2004, STAT SCI, V19, P686
  • [8] A global test for groups of genes: testing association with a clinical outcome
    Goeman, JJ
    van de Geer, SA
    de Kort, F
    van Houwelingen, HC
    [J]. BIOINFORMATICS, 2004, 20 (01) : 93 - 99
  • [9] Prediction of chromosomal aneuploidy from gene expression data
    Hertzberg, Libi
    Betts, David R.
    Raimondi, Susana C.
    Schaefer, Beat W.
    Notterman, Daniel A.
    Domany, Eytan
    Iraeli, Shai
    [J]. GENES CHROMOSOMES & CANCER, 2007, 46 (01) : 75 - 86
  • [10] Huber P.J., 1981, ROBUST STAT WILEY SE