Nearest Consensus Clustering Classification to Identify Subclasses and Predict Disease

被引:11
作者
Alyousef A.A. [1 ]
Nihtyanova S. [2 ]
Denton C. [2 ]
Bosoni P. [3 ]
Bellazzi R. [3 ]
Tucker A. [1 ]
机构
[1] Department Computer Science, Brunel University London, Uxbridge
[2] UCL Royal Free Hospital, London
[3] University of Pavia, Pavia
关键词
Classification; Consensus clustering; Disease subgroup discovery;
D O I
10.1007/s41666-018-0029-6
中图分类号
学科分类号
摘要
Disease subtyping, which helps to develop personalized treatments, remains a challenge in data analysis because of the many different ways to group patients based upon their data. However, if we can identify subclasses of disease, then it will help to develop better models that are more specific to individuals and should therefore improve prediction and understanding of the underlying characteristics of the disease in question. This paper proposes a new algorithm that integrates consensus clustering methods with classification in order to overcome issues with sample bias. The new algorithm combines K-means with consensus clustering in order build cohort-specific decision trees that improve classification as well as aid the understanding of the underlying differences of the discovered groups. The methods are tested on a real-world freely available breast cancer dataset and data from a London hospital on systemic sclerosis, a rare potentially fatal condition. Results show that “nearest consensus clustering classification” improves the accuracy and the prediction significantly when this algorithm has been compared with competitive similar methods. © 2018, The Author(s).
引用
收藏
页码:402 / 422
页数:20
相关论文
共 33 条
[1]  
Kellam P., Liu X., Martin N., Orengo C., Swift S., Tucker A., Comparing, contrasting and combining in viral gene expression data, (2004)
[2]  
Kalyani P., Approaches to partition medical data using clustering algorithms, Int J Comput Appl, 49, N23, pp. 7-10, (2012)
[3]  
Wu P., Liu J., Pei S., Wu C., Yang K., Wang S., Wu S., Integrated genomic analysis identifies clinically relevant subtypes of renal clear cell carcinoma, BMC Cancer, 18, 1, (2018)
[4]  
Zhu P., Zhu W., Hu Q., Zhang C., Zuo W., Subspace clustering guided unsupervised feature selection, Pattern Recogn, 66, pp. 364-374, (2017)
[5]  
Tucker A., Garway D., The pseudotemporal bootstrap for predicting glaucoma from cross-sectional visual field data, IEEE, 14, (2010)
[6]  
Stephen S., Tucker R., Vinciotti V., Martin N., Orengo C., Liu X., Kellam P., Consensus clustering and functional interpretation of gene expression data, Genome Biol, 5, 11, (2004)
[7]  
Garibaldi J., Rasmani K., Consensus Clustering and Fuzzy Classification for Breast Cancer Prognosis, (2007)
[8]  
Nguyen N., Caruana R., Consensus clustering, (2005)
[9]  
Swift S., Tucker A., Liu X., An analysis of scalable methods for clustering high-dimensional gene expression, Annals of Mathematics and Teleinformatics, 2, n1, (2004)
[10]  
Soni J., Ansari U., Predictive data mining diagnosis: An overview of heart disease prediction, Int J, 17, 8, pp. 0975-8887, (2011)