Privacy-preserving data mining of medical data using data separation-based techniques

被引:0
作者
Gang, Kou [1 ]
Yi, Peng [1 ]
Yong, Shi [1 ,2 ]
Zhengxin, Chen [1 ]
机构
[1] College of Information Science and Technology, University of Nebraska at Omaha, Omaha
[2] Chinese Academy of Sciences, Research Center on Data Technology and Knowledge Economy, Graduate University
关键词
Classification; Medical data mining; Privacy-preserving data mining; Vertically partitioned data;
D O I
10.2481/dsj.6.S429
中图分类号
学科分类号
摘要
Data mining is concerned with the extraction of useful knowledge from various types of data. Medical data mining has been a popular data mining topic of late. Compared with other data mining areas, medical data mining has some unique characteristics. Because medical files are related to human subjects, privacy concerns are taken more seriously than other data mining tasks. This paper applied data separation-based techniques to preserve privacy in classification of medical data. We take two approaches to protect privacy: one approach is to vertically partition the medical data and mine these partitioned data at multiple sites; the other approach is to horizontally split data across multiple sites. In the vertical partition approach, each site uses a portion of the attributes to compute its results, and the distributed results are assembled at a central trusted party using a majority-vote ensemble method. In the horizontal partition approach, data are distributed among several sites. Each site computes its own data, and a central trusted party is responsible to integrate these results. We implement these two approaches using medical datasets from UCI KDD archive and report the experimental results.
引用
收藏
页码:S429 / S434
页数:5
相关论文
共 5 条
[1]  
Cios K.J., Moore G.W., Uniqueness of medical data mining, Artificial Intelligence in Medicine, 26, 1-2, pp. 1-24, (2002)
[2]  
Clifton C., Privacy, Security, and Data Mining, combined conference 13th European Conference on Machine Learning (ECML'02) and 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'02), (2002)
[3]  
Bennett K.P., Mangasarian O.L., Robust linear programming discrimination of two linearly inseparable sets, Optimization Methods and Software, 1, pp. 23-34, (1992)
[4]  
(2003)
[5]  
UCI Machine Learning repository, (2006)