Privacy-preserving data mining in electronic surveys

被引:0
作者
Zhan, J [1 ]
Matwin, S [1 ]
机构
[1] Univ Ottawa, Sch Informat Technol & Engn, Ottawa, ON, Canada
来源
SHAPING BUSINESS STRATEGY IN A NETWORKED WORLD, VOLS 1 AND 2, PROCEEDINGS | 2004年
关键词
privacy; data mining; randomization;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Electronic surveys are an important resource in data mining. However, how to protect respondents' data privacy during the survey is a challenge to the security and privacy community. In this paper, we develop a scheme to solve the problem of privacy-preserving data mining in electronic surveys. We propose a randomized response technique to collect the data from the respondents. We then demonstrate how to perform data mining computations on randomized data. Specifically, we apply our scheme to build a Naive Bayesian classifier from randomized data. Our experimental results indicate that accuracy of classification in our scheme, when private data is protected by randomization, is close to the accuracy of a classifier build from the same data with the total disclosure of private information. Finally, we develop a measure to quantify privacy achieved by our proposed scheme.
引用
收藏
页码:1179 / 1185
页数:7
相关论文
共 10 条
[1]  
Agrawal R., 2000, P ACM SIGMOD C MAN D
[2]  
CRANOR LF, 1999, BEYOND CONCERN UNDER
[3]  
DU W, 2003, P 9 ACM SIGKDD INT C
[4]  
Evfimievski Alexandre V., 2003, P 22 ACM SIGACT SIGM
[5]  
KARGUPTA H, 2003, IEEE INT C DAT MIN F
[6]  
LANGLEY P, 1992, NAT C ART INT, P223
[7]  
Lindell Y, 2000, LECT NOTES COMPUT SC, V1880, P36
[8]  
RIZVI S, 2002, P 28 VLDB C HONG KON
[9]  
Vaidya J., 2003, P 9 ACM SIGKDD INT C
[10]  
WARNER S, 1996, AM STAT ASS, V60, P63