Cluster analysis and its application to healthcare claims data: a study of end-stage renal disease patients who initiated hemodialysis

被引:67
作者
Liao, Minlei [1 ]
Li, Yunfeng [2 ]
Kianifard, Farid [3 ]
Obi, Engels [4 ]
Arcona, Stephen [2 ]
机构
[1] KMK Consulting Inc, 23 Headquarters Plaza, Morristown, NJ 07960 USA
[2] Novartis Pharmaceut, Outcomes Res Methods & Analyt, US Hlth Econ & Outcomes Res, One Hlth Plaza, E Hanover, NJ 07936 USA
[3] Novartis Pharmaceut, Biometr, US Med, One Hlth Plaza, E Hanover, NJ 07936 USA
[4] Novartis Pharmaceut, Cardiovasc Resp, US Hlth Econ & Outcomes Res, One Hlth Plaza, E Hanover, NJ 07936 USA
关键词
K-means cluster analysis; Hierarchical cluster analysis; Healthcare claims data; Cost changes; DIALYSIS; HYPERTENSION; NEPHROLOGY; OUTCOMES;
D O I
10.1186/s12882-016-0238-2
中图分类号
R5 [内科学]; R69 [泌尿科学(泌尿生殖系疾病)];
学科分类号
1002 ; 100201 ;
摘要
Background: Cluster analysis (CA) is a frequently used applied statistical technique that helps to reveal hidden structures and "clusters" found in large data sets. However, this method has not been widely used in large healthcare claims databases where the distribution of expenditure data is commonly severely skewed. The purpose of this study was to identify cost change patterns of patients with end-stage renal disease (ESRD) who initiated hemodialysis (HD) by applying different clustering methods. Methods: A retrospective, cross-sectional, observational study was conducted using the Truven Health MarketScan (R) Research Databases. Patients aged >= 18 years with >= 2 ESRD diagnoses who initiated HD between 2008 and 2010 were included. The K-means CA method and hierarchical CA with various linkage methods were applied to all-cause costs within baseline (12-months pre-HD) and follow-up periods (12-months post-HD) to identify clusters. Demographic, clinical, and cost information was extracted from both periods, and then examined by cluster. Results: A total of 18,380 patients were identified. Meaningful all-cause cost clusters were generated using K-means CA and hierarchical CA with either flexible beta or Ward's methods. Based on cluster sample sizes and change of cost patterns, the K-means CA method and 4 clusters were selected: Cluster 1: Average to High (n = 113); Cluster 2: Very High to High (n = 89); Cluster 3: Average to Average (n = 16,624); or Cluster 4: Increasing Costs, High at Both Points (n = 1554). Median cost changes in the 12-month pre-HD and post-HD periods increased from $185,070 to $884,605 for Cluster 1 (Average to High), decreased from $910,930 to $ 157,997 for Cluster 2 (Very High to High), were relatively stable and remained low from $15,168 to $13,026 for Cluster 3 (Average to Average), and increased from $57,909 to $193,140 for Cluster 4 (Increasing Costs, High at Both Points). Relatively stable costs after starting HD were associated with more stable scores on comorbidity index scores from the pre-and post-HD periods, while increasing costs were associated with more sharply increasing comorbidity scores. Conclusions: The K-means CA method appeared to be the most appropriate in healthcare claims data with highly skewed cost information when taking into account both change of cost patterns and sample size in the smallest cluster.
引用
收藏
页数:14
相关论文
共 39 条
[11]   The use and reporting of cluster analysis in health psychology: A review [J].
Clatworthy, J ;
Buick, D ;
Hankins, M ;
Weinman, J ;
Horne, R .
BRITISH JOURNAL OF HEALTH PSYCHOLOGY, 2005, 10 :329-358
[12]   Methods for analyzing health care utilization and costs [J].
Diehr, P ;
Yanez, D ;
Ash, A ;
Hornbrook, M ;
Lin, DY .
ANNUAL REVIEW OF PUBLIC HEALTH, 1999, 20 :125-144
[13]   USING CLUSTER-ANALYSIS FOR MEDICAL RESOURCE DECISION-MAKING [J].
DILTS, D ;
KHAMALAH, J ;
PLOTKIN, A .
MEDICAL DECISION MAKING, 1995, 15 (04) :333-347
[14]   Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868
[15]   Comorbidity measures for use with administrative data [J].
Elixhauser, A ;
Steiner, C ;
Harris, DR ;
Coffey, RN .
MEDICAL CARE, 1998, 36 (01) :8-27
[16]  
Florek K., 1951, PRZEGLAD ANTROPOL, V17, P193
[17]  
Griswold M., 2004, BIOSTATISTICS, V1, P1, DOI DOI 10.1016/J.AJOG.2006.01.076
[18]   Obesity, hypertension, and chronic kidney disease [J].
Hall, Michael E. ;
do Carmo, Jussara M. ;
da Silva, Alexandre A. ;
Juncos, Luis A. ;
Wang, Zhen ;
Hall, John E. .
INTERNATIONAL JOURNAL OF NEPHROLOGY AND RENOVASCULAR DISEASE, 2014, 7 :75-88
[19]   A GENERAL THEORY OF CLASSIFICATORY SORTING STRATEGIES .1. HIERARCHICAL SYSTEMS [J].
LANCE, GN ;
WILLIAMS, WT .
COMPUTER JOURNAL, 1967, 9 (04) :373-&
[20]   Early Nephrology Referral Reduces the Economic Costs among Patients Who Start Renal Replacement Therapy: A Prospective Cohort Study in Korea [J].
Lee, Jeonghwan ;
Lee, Jung Pyo ;
Park, Ji In ;
Hwang, Jin Ho ;
Jang, Hye Min ;
Choi, Ji-Young ;
Kim, Yong-Lim ;
Yang, Chul Woo ;
Kang, Shin-Wook ;
Kim, Nam-Ho ;
Kim, Yon Su ;
Lim, Chun Soo .
PLOS ONE, 2014, 9 (06)