A local asynchronous distributed privacy preserving feature selection algorithm for large peer-to-peer networks

被引:25
作者
Das, Kamalika [1 ]
Bhaduri, Kanishka [2 ]
Kargupta, Hillol [3 ,4 ]
机构
[1] Stinger Ghaffarian Technol Inc, IDU Grp, NASA Ames Res Ctr, Moffett Field, CA 94035 USA
[2] Mission Critical Technol Inc, IDU Grp, NASA Ames Res Ctr, Moffett Field, CA 94035 USA
[3] Univ Maryland, Dept CSEE, Baltimore, MD 21250 USA
[4] AGNIK LLC, Baltimore, MD USA
关键词
Privacy preserving; Data mining; Feature selection; Distributed computation;
D O I
10.1007/s10115-009-0274-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we develop a local distributed privacy preserving algorithm for feature selection in a large peer-to-peer environment. Feature selection is often used in machine learning for data compaction and efficient learning by eliminating the curse of dimensionality. There exist many solutions for feature selection when the data are located at a central location. However, it becomes extremely challenging to perform the same when the data are distributed across a large number of peers or machines. Centralizing the entire dataset or portions of it can be very costly and impractical because of the large number of data sources, the asynchronous nature of the peer-to-peer networks, dynamic nature of the data/network, and privacy concerns. The solution proposed in this paper allows us to perform feature selection in an asynchronous fashion with a low communication overhead where each peer can specify its own privacy constraints. The algorithm works based on local interactions among participating nodes. We present results on real-world dataset in order to test the performance of the proposed algorithm.
引用
收藏
页码:341 / 367
页数:27
相关论文
共 28 条
[1]  
[Anonymous], 2006, Introduction to Data Mining
[2]  
[Anonymous], P 14 INT C MACH LEAR
[3]  
[Anonymous], 2002, ACM Sigkdd Explorations Newsletter, DOI [10.1145/772862.772867, DOI 10.1145/772862.772867]
[4]  
[Anonymous], 1998, Feature Extraction, Construction and Selection: A Data Mining Perspective
[5]  
Bhaduri Kanishka., 2008, Statistical Analysis and Data Mining, V1, P85, DOI DOI 10.1002/SAM.10006
[6]   Collective mining of Bayesian networks from distributed heterogeneous data [J].
Chen, R ;
Sivakumar, K ;
Kargupta, H .
KNOWLEDGE AND INFORMATION SYSTEMS, 2004, 6 (02) :164-187
[7]   Distributed Mining of Classification Rules [J].
Cho, Vincent ;
Wüthrich, Beat .
Knowledge and Information Systems, 2002, 4 (01) :1-30
[8]  
DAS K, 2009, P P2P 09 SEATT, P212
[9]   Distributed identification of top-l inner product elements and its application in a peer-to-peer network [J].
Das, Kamalika ;
Bhaduri, Kanishka ;
Liu, Kun ;
Kargupta, Hillol .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (04) :475-488
[10]  
Datta S, 2006, SIAM PROC S, P153