An additional k-means clustering step improves the biological features of WGCNA gene co-expression networks

被引:191
作者
Botia, Juan A. [1 ,2 ]
Vandrovcova, Jana [2 ]
Forabosco, Paola [3 ]
Guelfi, Sebastian [2 ]
D'Sa, Karishma [1 ,2 ]
Hardy, John [2 ]
Lewis, Cathryn M. [1 ]
Ryten, Mina [1 ,2 ]
Weale, Michael E. [1 ]
机构
[1] UCL, Inst Neurol, Dept Mol Neurosci, Queen Sq, London WC1N, England
[2] Kings Coll London, Sch Med Sci, Dept Med & Mol Genet, Guys Hosp, London SE1 9RT, England
[3] Cittadella Univ Monserrato, CNR, Ist Ric Genet & Biomed, I-09042 Monserrato, CA, Italy
基金
英国医学研究理事会;
关键词
Gene co-expression networks on brain; K-means applied to WGCNA; Assessment of better gene clusters on bulk tissue; EXPRESSION DATA; GENOTYPE; INSIGHTS;
D O I
10.1186/s12918-017-0420-6
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Weighted Gene Co-expression Network Analysis (WGCNA) is a widely used R software package for the generation of gene co-expression networks (GCN). WGCNA generates both a GCN and a derived partitioning of clusters of genes (modules). We propose k-means clustering as an additional processing step to conventional WGCNA, which we have implemented in the R package km2gcn (k-means to gene co-expression network, https://github.com/juanbot/km2gcn). Results: We assessed our method on networks created from UKBEC data (10 different human brain tissues), on networks created from GTEx data (42 human tissues, including 13 brain tissues), and on simulated networks derived from GTEx data. We observed substantially improved module properties, including: (1) few or zero misplaced genes; (2) increased counts of replicable clusters in alternate tissues (x3.1 on average); (3) improved enrichment of Gene Ontology terms (seen in 48/52 GCNs) (4) improved cell type enrichment signals (seen in 21/23 brain GCNs); and (5) more accurate partitions in simulated data according to a range of similarity indices. Conclusions: The results obtained from our investigations indicate that our k-means method, applied as an adjunct to standard WGCNA, results in better network partitions. These improved partitions enable more fruitful downstream analyses, as gene modules are more biologically meaningful.
引用
收藏
页数:16
相关论文
共 37 条
  • [1] Scale-free networks in cell biology
    Albert, R
    [J]. JOURNAL OF CELL SCIENCE, 2005, 118 (21) : 4947 - 4957
  • [2] Comparing Statistical Methods for Constructing Large Scale Gene Networks
    Allen, Jeffrey D.
    Xie, Yang
    Chen, Min
    Girard, Luc
    Xiao, Guanghua
    [J]. PLOS ONE, 2012, 7 (01):
  • [3] [Anonymous], BELL SYST TECH J
  • [4] [Anonymous], 2007, SODA 07 P 18 ANN ACM
  • [5] [Anonymous], 2000, NATURE STAT LEARNING, DOI DOI 10.1007/978-1-4757-3264-1
  • [6] The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans
    Ardlie, Kristin G.
    DeLuca, David S.
    Segre, Ayellet V.
    Sullivan, Timothy J.
    Young, Taylor R.
    Gelfand, Ellen T.
    Trowbridge, Casandra A.
    Maller, Julian B.
    Tukiainen, Taru
    Lek, Monkol
    Ward, Lucas D.
    Kheradpour, Pouya
    Iriarte, Benjamin
    Meng, Yan
    Palmer, Cameron D.
    Esko, Tonu
    Winckler, Wendy
    Hirschhorn, Joel N.
    Kellis, Manolis
    MacArthur, Daniel G.
    Getz, Gad
    Shabalin, Andrey A.
    Li, Gen
    Zhou, Yi-Hui
    Nobel, Andrew B.
    Rusyn, Ivan
    Wright, Fred A.
    Lappalainen, Tuuli
    Ferreira, Pedro G.
    Ongen, Halit
    Rivas, Manuel A.
    Battle, Alexis
    Mostafavi, Sara
    Monlong, Jean
    Sammeth, Michael
    Mele, Marta
    Reverter, Ferran
    Goldmann, Jakob M.
    Koller, Daphne
    Guigo, Roderic
    McCarthy, Mark I.
    Dermitzakis, Emmanouil T.
    Gamazon, Eric R.
    Im, Hae Kyung
    Konkashbaev, Anuar
    Nicolae, Dan L.
    Cox, Nancy J.
    Flutre, Timothee
    Wen, Xiaoquan
    Stephens, Matthew
    [J]. SCIENCE, 2015, 348 (6235) : 648 - 660
  • [7] Guidance for RNA-seq co-expression network construction and analysis: safety in numbers
    Ballouz, S.
    Verleyen, W.
    Gillis, J.
    [J]. BIOINFORMATICS, 2015, 31 (13) : 2123 - 2130
  • [8] Insights From Cerebellar Transcriptomic Analysis Into the Pathogenesis of Ataxia
    Bettencourt, Conceicao
    Ryten, Mina
    Forabosco, Paola
    Schorge, Stephanie
    Hersheson, Joshua
    Hardy, John
    Houlden, Henry
    [J]. JAMA NEUROLOGY, 2014, 71 (07) : 831 - 839
  • [9] Gene Ontology Consortium: going forward
    Blake, J. A.
    Christie, K. R.
    Dolan, M. E.
    Drabkin, H. J.
    Hill, D. P.
    Ni, L.
    Sitnikov, D.
    Burgess, S.
    Buza, T.
    Gresham, C.
    McCarthy, F.
    Pillai, L.
    Wang, H.
    Carbon, S.
    Dietze, H.
    Lewis, S. E.
    Mungall, C. J.
    Munoz-Torres, M. C.
    Feuermann, M.
    Gaudet, P.
    Basu, S.
    Chisholm, R. L.
    Dodson, R. J.
    Fey, P.
    Mi, H.
    Thomas, P. D.
    Muruganujan, A.
    Poudel, S.
    Hu, J. C.
    Aleksander, S. A.
    McIntosh, B. K.
    Renfro, D. P.
    Siegele, D. A.
    Attrill, H.
    Brown, N. H.
    Tweedie, S.
    Lomax, J.
    Osumi-Sutherland, D.
    Parkinson, H.
    Roncaglia, P.
    Lovering, R. C.
    Talmud, P. J.
    Humphries, S. E.
    Denny, P.
    Campbell, N. H.
    Foulger, R. E.
    Chibucos, M. C.
    Giglio, M. Gwinn
    Chang, H. Y.
    Finn, R.
    [J]. NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) : D1049 - D1056
  • [10] A transcriptome database for astrocytes, neurons, and oligodendrocytes: A new resource for understanding brain development and function
    Cahoy, John D.
    Emery, Ben
    Kaushal, Amit
    Foo, Lynette C.
    Zamanian, Jennifer L.
    Christopherson, Karen S.
    Xing, Yi
    Lubischer, Jane L.
    Krieg, Paul A.
    Krupenko, Sergey A.
    Thompson, Wesley J.
    Barres, Ben A.
    [J]. JOURNAL OF NEUROSCIENCE, 2008, 28 (01) : 264 - 278