Entropy-based consensus clustering for patient stratification

被引:71
作者
Liu, Hongfu [1 ]
Zhao, Rui [2 ,3 ]
Fang, Hongsheng [2 ,4 ]
Cheng, Feixiong [5 ,6 ,7 ]
Fu, Yun [1 ,8 ]
Liu, Yang-Yu [2 ,7 ]
机构
[1] Northeastern Univ, Dept Elect & Comp Engn, Boston, MA 02115 USA
[2] Harvard Med Sch, Brigham & Womens Hosp, Channing Div Network Med, Boston, MA 02115 USA
[3] Harvard Univ, Sch Engn & Appl Sci, Cambridge, MA 02138 USA
[4] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[5] Northeastern Univ, Ctr Complex Network Res, Boston, MA 02115 USA
[6] Northeastern Univ, Dept Phys, Boston, MA 02115 USA
[7] Dana Farber Canc Inst, Ctr Canc Syst Biol, Boston, MA 02115 USA
[8] Northeastern Univ, Coll Comp & Informat Sci, Boston, MA 02115 USA
关键词
D O I
10.1093/bioinformatics/btx167
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Patient stratification or disease subtyping is crucial for precision medicine and personalized treatment of complex diseases. The increasing availability of high-throughput molecular data provides a great opportunity for patient stratification. Many clustering methods have been employed to tackle this problem in a purely data-driven manner. Yet, existing methods leveraging high-throughput molecular data often suffers from various limitations, e.g. noise, data heterogeneity, high dimensionality or poor interpretability. Results: Here we introduced an Entropy-based Consensus Clustering (ECC) method that overcomes those limitations all together. Our ECC method employs an entropy-based utility function to fuse many basic partitions to a consensus one that agrees with the basic ones as much as possible. Maximizing the utility function in ECC has a much more meaningful interpretation than any other consensus clustering methods. Moreover, we exactly map the complex utility maximization problem to the classic K-means clustering problem, which can then be efficiently solved with linear time and space complexity. Our ECC method can also naturally integrate multiple molecular data types measured from the same set of subjects, and easily handle missing values without any imputation. We applied ECC to 110 synthetic and 48 real datasets, including 35 cancer gene expression benchmark datasets and 13 cancer types with four molecular data types from The Cancer Genome Atlas. We found that ECC shows superior performance against existing clustering methods. Our results clearly demonstrate the power of ECC in clinically relevant patient stratification.
引用
收藏
页码:2691 / 2698
页数:8
相关论文
共 27 条
  • [1] Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach
    Aerts, Hugo J. W. L.
    Velazquez, Emmanuel Rios
    Leijenaar, Ralph T. H.
    Parmar, Chintan
    Grossmann, Patrick
    Cavalho, Sara
    Bussink, Johan
    Monshouwer, Rene
    Haibe-Kains, Benjamin
    Rietveld, Derek
    Hoebers, Frank
    Rietbergen, Michelle M.
    Leemans, C. Rene
    Dekker, Andre
    Quackenbush, John
    Gillies, Robert J.
    Lambin, Philippe
    [J]. NATURE COMMUNICATIONS, 2014, 5
  • [2] Pan-cancer analysis of the extent and consequences of intratumor heterogeneity
    Andor, Noemi
    Graham, Trevor A.
    Jansen, Marnix
    Xia, Li C.
    Aktipis, C. Athena
    Petritsch, Claudia
    Ji, Hanlee P.
    Maley, Carlo C.
    [J]. NATURE MEDICINE, 2016, 22 (01) : 105 - +
  • [3] [Anonymous], P ACM SIGKDD INT C K
  • [4] [Anonymous], P ACM SIGKDD INT C K
  • [5] Precision medicine for metastatic breast cancer-limitations and solutions
    Arnedos, Monica
    Vicier, Cecile
    Loi, Sherene
    Lefebvre, Celine
    Michiels, Stefan
    Bonnefoi, Herve
    Andre, Fabrice
    [J]. NATURE REVIEWS CLINICAL ONCOLOGY, 2015, 12 (12) : 693 - 704
  • [6] Patient-centric trials for therapeutic development in precision oncology
    Biankin, Andrew V.
    Piantadosi, Steven
    Hollingsworth, Simon J.
    [J]. NATURE, 2015, 526 (7573) : 361 - 370
  • [7] Big data visualization identifies the multidimensional molecular landscape of human gliomas
    Bolouri, Hamid
    Zhao, Lue Ping
    Holland, Eric C.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2016, 113 (19) : 5394 - 5399
  • [8] Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival
    Chang, HY
    Nuyten, DSA
    Sneddon, JB
    Hastie, T
    Tibshirani, R
    Sorlie, T
    Dai, HY
    He, YDD
    van't Veer, LJ
    Bartelink, H
    van de Rijn, M
    Brown, PO
    van de Vijver, MJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (10) : 3738 - 3743
  • [9] Biclustering with heterogeneous variance
    Chen, Guanhua
    Sullivan, Patrick F.
    Kosorok, Michael R.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2013, 110 (30) : 12253 - 12258
  • [10] Multiplex Genome Engineering Using CRISPR/Cas Systems
    Cong, Le
    Ran, F. Ann
    Cox, David
    Lin, Shuailiang
    Barretto, Robert
    Habib, Naomi
    Hsu, Patrick D.
    Wu, Xuebing
    Jiang, Wenyan
    Marraffini, Luciano A.
    Zhang, Feng
    [J]. SCIENCE, 2013, 339 (6121) : 819 - 823