Bayesian factor models for the detection of coherent patterns in gene expression data

被引:5
作者
Mayrink, Vinicius D. [1 ]
Lucas, Joseph E. [2 ]
机构
[1] Univ Fed Minas Gerais, Dept Estat, ICEx, BR-31270901 Belo Horizonte, MG, Brazil
[2] Duke Univ, Inst Genome Sci & Policy, Durham, NC 27708 USA
关键词
Coherent; copy number alteration; detection call; factor model; high-throughput data; microarray; BREAST-CANCER; RNA-SEQ; IDENTIFICATION; ABERRATIONS; SUMMARIES; MUTATION; CALL; P53;
D O I
10.1214/13-BJPS226
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
A common problem in the analysis of gene expression microarray data is the identification of groups of features that are coherently expressed. For example, one often wishes to know whether a group of genes, clustered because of correlation in one data set, are still highly co-expressed in another data set. Alternatively, for some expression array platforms there are many, relatively short probes for each gene of interest. In this case, it is possible that a given probe is not measuring its targeted gene, but rather a different gene with a similar region (called cross-hybridization). Accurate detection of the collection of probe sets (groups of probes targeting the same gene) which demonstrate highly coherent expression patterns is the best approach to the identification of which genes are present in the sample. We develop a Bayesian Factor Model (BFM) to address the general problem of detection of coherent patterns in gene expression data sets. We compare our method to "state of the art" methods for the identification of expressed genes in both synthetic and real data sets, and the results indicate that the BFM outperforms the other procedures for detecting transcripts. We also demonstrate the use of factor analysis to identify the presence/absence status of gene modules (groups of coherently expressed genes). Variation in the number of copies of regions of the genome is a well known and important feature of most cancers. We examine a group of genes, representative of Copy Number Alteration (CNA) in breast cancer, then identify the presence/absence of CNA in this region of the genome for other cancers. Coherent patterns can also be evaluated in high-throughput sequencing data, a novel technology to measure gene expression. We analyze this type of data via factor model and examine the detection calls in terms of read mapping uncertainty.
引用
收藏
页码:1 / 33
页数:33
相关论文
共 47 条
  • [1] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [2] [Anonymous], 2005, EXON ARRAY BACKGROUN
  • [3] [Anonymous], 2001, STAT ALGORITHMS REFE
  • [4] Detection call algorithms for high-throughput gene expression microarray data
    Archer, Kellie J.
    Reese, Sarah E.
    [J]. BRIEFINGS IN BIOINFORMATICS, 2010, 11 (02) : 244 - 252
  • [5] Oncogenic pathway signatures in human cancers as a guide to targeted therapies
    Bild, AH
    Yao, G
    Chang, JT
    Wang, QL
    Potti, A
    Chasse, D
    Joshi, MB
    Harpole, D
    Lancaster, JM
    Berchuck, A
    Olson, JA
    Marks, JR
    Dressman, HK
    West, M
    Nevins, JR
    [J]. NATURE, 2006, 439 (7074) : 353 - 357
  • [6] Partial least squares: a versatile tool for the analysis of high-dimensional genomic data
    Boulesteix, Anne-Laure
    Strimmer, Korbinian
    [J]. BRIEFINGS IN BIOINFORMATICS, 2007, 8 (01) : 32 - 44
  • [7] Metagenes and molecular pattern discovery using matrix factorization
    Brunet, JP
    Tamayo, P
    Golub, TR
    Mesirov, JP
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (12) : 4164 - 4169
  • [8] High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics
    Carvalho, Carlos M.
    Chang, Jeffrey
    Lucas, Joseph E.
    Nevins, Joseph R.
    Wang, Quanli
    West, Mike
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2008, 103 (484) : 1438 - 1456
  • [9] Genomic and transcriptional aberrations linked to breast cancer pathophysiologies
    Chin, Koei
    DeVries, Sandy
    Fridlyand, Jane
    Spellman, Paul T.
    Roydasgupta, Ritu
    Kuo, Wen-Lin
    Lapuk, Anna
    Neve, Richard M.
    Qian, Zuwei
    Ryder, Tom
    Chen, Fanqing
    Feiler, Heidi
    Tokuyasu, Taku
    Kingsley, Chris
    Dairkee, Shanaz
    Meng, Zhenhang
    Chew, Karen
    Pinkel, Daniel
    Jain, Ajay
    Ljung, Britt Marie
    Esserman, Laura
    Albertson, Donna G.
    Waldman, Frederic M.
    Gray, Joe W.
    [J]. CANCER CELL, 2006, 10 (06) : 529 - 541
  • [10] STAC: A method for testing the significance of DNA copy number aberrations across multiple array-CGH experiments
    Diskin, Sharon J.
    Eck, Thomas
    Greshock, Joel
    Mosse, Yael P.
    Naylor, Tara
    Stoeckert, Christian J., Jr.
    Weber, Barbara L.
    Maris, John M.
    Grant, Gregory R.
    [J]. GENOME RESEARCH, 2006, 16 (09) : 1149 - 1158