Efficient data sampling in heterogeneous peer-to-peer networks

被引:0
|
作者
Arai, Benjamin [1 ]
Lin, Song [1 ]
Gunopulos, Dimitrios [1 ]
机构
[1] Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92521 USA
来源
ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING | 2007年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Performing data-mining tasks such as clustering, classification, and prediction on large datasets is an arduous task and, many times, it is an infeasible task given current hardware limitations. The distributed nature of peer-to-peer databases further complicates this issue by introducing an access overhead cost in addition to the cost of sending individual tuples over the network. We propose a two-level sampling approach focusing on peer-to-peer databases for maximizing sample quality given a user-defined communication budget. Given that individual peers may have varying cardinality we propose an algorithm for determining the optimal sample rate (the percentage of tuples to sample from a peer)for each peer We do this by analyzing the variance of individual peers, ultimately minimizing the total variance of the entire sample. By performing local optimization of individual peer sample rates we maximize approximation accuracy of the samples. We also offer several techniques for sampling in peer-to-peer databases given various amounts of known and unknown information about the network and its peers.
引用
收藏
页码:23 / 32
页数:10
相关论文
共 50 条
  • [1] Dynamics of heterogeneous peer-to-peer networks
    Paganini, Fernando
    Ferragut, Andres
    Zubeldia, Martin
    2013 IEEE 52ND ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2013, : 3293 - 3298
  • [2] Proportional fairness in heterogeneous peer-to-peer networks through reciprocity and Gibbs sampling
    Zubeldia, Martin
    Ferragut, Andres
    Paganini, Fernando
    2013 51ST ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2013, : 123 - 130
  • [3] Towards reliable and efficient data dissemination in heterogeneous peer-to-peer systems
    Li, Zhenyu
    Xie, Gaogang
    Li, Zhongcheng
    2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8, 2008, : 974 - 985
  • [4] Efficient flooding in peer-to-peer networks
    Wu, Ai
    Liu, Xinsong
    Liu, Kejian
    7TH INTERNATIONAL CONFERENCE ON COMPUTER-AIDED INDUSTRIAL DESIGN & CONCEPTUAL DESIGN, 2006, : 61 - +
  • [5] On Unbiased Sampling for Unstructured Peer-to-Peer Networks
    Stutzbach, Daniel
    Rejaie, Reza
    Duffield, Nick
    Sen, Subhabrata
    Willinger, Walter
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2009, 17 (02) : 377 - 390
  • [6] Energy efficient data retrieval and caching in mobile peer-to-peer networks
    Joseph, MS
    Kumar, M
    Shen, HP
    Das, S
    Third IEEE International Conference on Pervasive Computing and Communications, Workshops, 2005, : 50 - 54
  • [7] Efficient multi-source data dissemination in Peer-to-Peer networks
    Li, Zhenyu
    Zhu, Zengyang
    Xie, Gaogang
    Li, Zhongcheng
    NETWORKING 2008: AD HOC AND SENSOR NETWORKS, WIRELESS NETWORKS, NEXT GENERATION INTERNET, PROCEEDINGS, 2008, 4982 : 409 - 420
  • [8] Efficient search in unstructured peer-to-peer networks
    Cholvi, V
    Felber, P
    Biersack, E
    EUROPEAN TRANSACTIONS ON TELECOMMUNICATIONS, 2004, 15 (06): : 535 - 548
  • [9] Efficient content authentication in peer-to-peer networks
    Tamassia, Roberto
    Triandopoulos, Nikos
    APPLIED CRYPTOGRAPHY AND NETWORK SECURITY, PROCEEDINGS, 2007, 4521 : 354 - +
  • [10] Efficient skyline retrieval on peer-to-peer networks
    Zhu, Lin
    Zhou, Shuigeng
    Guan, Jihong
    PROCEEDINGS OF FUTURE GENERATION COMMUNICATION AND NETWORKING, WORKSHOP PAPERS, VOL 2, 2007, : 309 - +