Statistical Test of Expression Pattern (STEPath): a new strategy to integrate gene expression data with genomic information in individual and meta-analysis studies

被引:4
作者
Martini, Paolo [1 ,2 ]
Risso, Davide [3 ]
Sales, Gabriele [3 ]
Romualdi, Chiara [2 ]
Lanfranchi, Gerolamo [1 ,2 ]
Cagnin, Stefano [1 ,2 ]
机构
[1] Univ Padua, CRIBI Biotechnol Ctr, I-35121 Padua, Italy
[2] Univ Padua, Dept Biol, I-35121 Padua, Italy
[3] Univ Padua, Dept Stat Sci, I-35121 Padua, Italy
来源
BMC BIOINFORMATICS | 2011年 / 12卷
关键词
MOLECULAR DIAGNOSTICS APPLICATIONS; BREAST-CANCER; MULTIPLE BIOMARKERS; MICROARRAY DATA; CALPAIN; IDENTIFICATION; VALIDATION; DEFICIENCY; ABNORMALITIES; SIGNATURES;
D O I
10.1186/1471-2105-12-92
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: In the last decades, microarray technology has spread, leading to a dramatic increase of publicly available datasets. The first statistical tools developed were focused on the identification of significant differentially expressed genes. Later, researchers moved toward the systematic integration of gene expression profiles with additional biological information, such as chromosomal location, ontological annotations or sequence features. The analysis of gene expression linked to physical location of genes on chromosomes allows the identification of transcriptionally imbalanced regions, while, Gene Set Analysis focuses on the detection of coordinated changes in transcriptional levels among sets of biologically related genes. In this field, meta-analysis offers the possibility to compare different studies, addressing the same biological question to fully exploit public gene expression datasets. Results: We describe STEPath, a method that starts from gene expression profiles and integrates the analysis of imbalanced region as an a priori step before performing gene set analysis. The application of STEPath in individual studies produced gene set scores weighted by chromosomal activation. As a final step, we propose a way to compare these scores across different studies (meta-analysis) on related biological issues. One complication with meta-analysis is batch effects, which occur because molecular measurements are affected by laboratory conditions, reagent lots and personnel differences. Major problems occur when batch effects are correlated with an outcome of interest and lead to incorrect conclusions. We evaluated the power of combining chromosome mapping and gene set enrichment analysis, performing the analysis on a dataset of leukaemia (example of individual study) and on a dataset of skeletal muscle diseases (meta-analysis approach). In leukaemia, we identified the Hox gene set, a gene set closely related to the pathology that other algorithms of gene set analysis do not identify, while the meta-analysis approach on muscular disease discriminates between related pathologies and correlates similar ones from different studies. Conclusions: STEPath is a new method that integrates gene expression profiles, genomic co-expressed regions and the information about the biological function of genes. The usage of the STEPath-computed gene set scores overcomes batch effects in the meta-analysis approaches allowing the direct comparison of different pathologies and different studies on a gene set activation level.
引用
收藏
页数:16
相关论文
共 71 条
  • [31] KEGG: Kyoto Encyclopedia of Genes and Genomes
    Kanehisa, M
    Goto, S
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 27 - 30
  • [32] KEGG for representation and analysis of molecular networks involving diseases and drugs
    Kanehisa, Minoru
    Goto, Susumu
    Furumichi, Miho
    Tanabe, Mao
    Hirakawa, Mika
    [J]. NUCLEIC ACIDS RESEARCH, 2010, 38 : D355 - D360
  • [33] From genomics to chemical genomics: new developments in KEGG
    Kanehisa, Minoru
    Goto, Susumu
    Hattori, Masahiro
    Aoki-Kinoshita, Kiyoko F.
    Itoh, Masumi
    Kawashima, Shuichi
    Katayama, Toshiaki
    Araki, Michihiro
    Hirakawa, Mika
    [J]. NUCLEIC ACIDS RESEARCH, 2006, 34 : D354 - D357
  • [34] Expansion of the BioCyc collection of pathway/genome databases to 160 genomes
    Karp, PD
    Ouzounis, CA
    Moore-Kochlacs, C
    Goldovsky, L
    Kaipa, P
    Ahrén, D
    Tsoka, S
    Darzentas, N
    Kunin, V
    López-Bigas, N
    [J]. NUCLEIC ACIDS RESEARCH, 2005, 33 (19) : 6083 - 6089
  • [35] Dysferlin Deficiency Shows Compensatory Induction of Rab27A/Slp2a That May Contribute to Inflammatory Onset
    Kesari, Akanchha
    Fukuda, Mitsunori
    Knoblach, Susan
    Bashir, Rumaisa
    Nader, Gustavo A.
    Rao, Deepak
    Nagaraju, Kanneboyina
    Hoffman, Eric P.
    [J]. AMERICAN JOURNAL OF PATHOLOGY, 2008, 173 (05) : 1476 - 1487
  • [36] A multivariate approach for integrating genome-wide expression data and biological knowledge
    Kong, Sek Won
    Pu, William T.
    Park, Peter J.
    [J]. BIOINFORMATICS, 2006, 22 (19) : 2373 - 2380
  • [37] Mitochondrial abnormalities, energy deficit and oxidative stress are features of calpain 3 deficiency in skeletal muscle
    Kramerova, Irina
    Kudryashova, Elena
    Wu, Benjamin
    Germain, Sean
    Vandenborne, Krista
    Romain, Nadine
    Haller, Ronald G.
    Verity, M. Anthony
    Spencer, Melissa J.
    [J]. HUMAN MOLECULAR GENETICS, 2009, 18 (17) : 3194 - 3205
  • [38] MLL translocations, histone modifications and leukaemia stem-cell development
    Krivtsov, Andrei V.
    Armstrong, Scott A.
    [J]. NATURE REVIEWS CANCER, 2007, 7 (11) : 823 - 833
  • [39] A model-based scan statistic for identifying extreme chromosomal regions of gene expression in human tumors
    Levin, AM
    Ghosh, D
    Cho, KR
    Kardia, SLR
    [J]. BIOINFORMATICS, 2005, 21 (12) : 2867 - 2874
  • [40] Multiple biomarkers in molecular oncology. II. Molecular diagnostics applications in breast cancer management
    Malinowski, Douglas P.
    [J]. EXPERT REVIEW OF MOLECULAR DIAGNOSTICS, 2007, 7 (03) : 269 - 280