Clinical charge profiles prediction for patients diagnosed with chronic diseases using Multi-level Support Vector Machine

被引:10
作者
Zhong, Wei [1 ]
Chow, Rick [1 ]
He, Jieyue [2 ]
机构
[1] Univ S Carolina Upstate, Div Math & Comp Sci, Upstate Spartanbur, SC 29303 USA
[2] Southeast Univ, Sch Comp Sci & Engn, Nanjing 210096, Peoples R China
关键词
Support Vector Machine; Classification problem; Multi-level clustering algorithm; Chronic disease and parallel algorithm; CLASSIFICATION; ALGORITHM;
D O I
10.1016/j.eswa.2011.08.036
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This research utilizes the national Healthcare Cost & Utilization Project (HCUP-3) databases to construct Support Vector Machine (SVM) classifiers to predict clinical charge profiles, including hospital charges and length of stay (LOS), for patients diagnosed with heart and circulatory disease, diabetes and cancer, respectively. Clinical charge profiles predictions can provides relevant clinical knowledge for healthcare policy makers to effectively manage healthcare services and costs at the national, state, and local levels. Despite its solid mathematical foundation and promising experimental results, SVM is not favorable for large-scale data mining tasks since its training time complexity is at least quadratic to the number of samples. Furthermore, traditional SVM classification algorithms cannot build an effective SVM when different data distribution patterns are intermingled in a large dataset. In order to enhance SVM training for large, complex and noisy healthcare datasets, we propose the Multi-level Support Vector Machine (MLSVM) that organizes the dataset as clusters in a tree to produce better partitions for more effective SVM classification. The MLSVM model utilizes multiple SVMs, each of which learns the local data distribution patterns in a cluster efficiently. A decision fusion algorithm is used to generate an effective global decision that incorporates local SVM decisions at different levels of the tree. Consequently, MLSVM can handle complex and often conflicting data distributions in large datasets more effectively than the single-SVM based approaches and the multiple SVM systems. Both the combined 5 x 2-fold cross validation F test and the independent test show that classification performance of MLSVM is much superior to that of a CVM, ACSVM and CSVM based on three popular performance evaluation metrics. In this work. CSVM and MLSVM are parallelized to speed up the slow SVM training process for very large and complex datasets. Running time analysis shows that MLSVM can accelerate SVM's training process noticeably when the parallel algorithm is employed. (C) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1474 / 1483
页数:10
相关论文
共 21 条
[1]   Combined 5 x 2 cv F test for comparing supervised classification learning algorithms [J].
Alpaydin, E .
NEURAL COMPUTATION, 1999, 11 (08) :1885-1892
[2]  
[Anonymous], 2011, Pei. data mining concepts and techniques
[3]  
Awad M, 2004, PROC INT C TOOLS ART, P663
[4]   Assessing the accuracy of prediction algorithms for classification: an overview [J].
Baldi, P ;
Brunak, S ;
Chauvin, Y ;
Andersen, CAF ;
Nielsen, H .
BIOINFORMATICS, 2000, 16 (05) :412-424
[5]   Data mining a diabetic data warehouse [J].
Breault, JL ;
Goodall, CR ;
Fos, PJ .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2002, 26 (1-2) :37-54
[6]  
BUTENHOF D, 1997, ADDISONWESLEY PROFES
[7]  
Cao D., 2004, P SIAM INT C DAT MIN, P126
[8]  
Cervantes J, 2008, IEEE SYS MAN CYBERN, P2608
[9]   Fast SVM training algorithm with decomposition on very large data sets [J].
Dong, JX ;
Krzyzak, A ;
Suen, CY .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2005, 27 (04) :603-618
[10]  
Joachims T, 1999, ADVANCES IN KERNEL METHODS, P169