Efficient data sampling in heterogeneous peer-to-peer networks

被引:0
|
作者
Arai, Benjamin [1 ]
Lin, Song [1 ]
Gunopulos, Dimitrios [1 ]
机构
[1] Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92521 USA
来源
ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING | 2007年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Performing data-mining tasks such as clustering, classification, and prediction on large datasets is an arduous task and, many times, it is an infeasible task given current hardware limitations. The distributed nature of peer-to-peer databases further complicates this issue by introducing an access overhead cost in addition to the cost of sending individual tuples over the network. We propose a two-level sampling approach focusing on peer-to-peer databases for maximizing sample quality given a user-defined communication budget. Given that individual peers may have varying cardinality we propose an algorithm for determining the optimal sample rate (the percentage of tuples to sample from a peer)for each peer We do this by analyzing the variance of individual peers, ultimately minimizing the total variance of the entire sample. By performing local optimization of individual peer sample rates we maximize approximation accuracy of the samples. We also offer several techniques for sampling in peer-to-peer databases given various amounts of known and unknown information about the network and its peers.
引用
收藏
页码:23 / 32
页数:10
相关论文
共 50 条
  • [31] Efficient Range Queries in Spatial Peer-to-Peer Networks
    Mustafa, Ahmed
    Al Aghbari, Zaher
    Kamel, Ibrahim
    IIT: 2008 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION TECHNOLOGY, 2008, : 193 - +
  • [32] An efficient algorithm for resource sharing in peer-to-peer networks
    Liao, Wei-Cherng
    Papadopoulos, Fragkiskos
    Psounis, Konstantinos
    NETWORKING 2006: NETWORKING TECHNOLOGIES, SERVICES, AND PROTOCOLS; PERFORMANCE OF COMPUTER AND COMMUNICATION NETWORKS; MOBILE AND WIRELESS COMMUNICATIONS SYSTEMS, 2006, 3976 : 592 - 605
  • [33] An Efficient Cache Strategy in Structured Peer-to-Peer Networks
    Chou, Shin-Yi
    Chen, Yu-Wei
    SOFTWARE AND COMPUTER APPLICATIONS, 2011, 9 : 38 - 41
  • [34] RIPPNET: Efficient Range Indexing in Peer-to-Peer Networks
    Ryeng, Norvald H.
    Norvag, Kjetil
    2008 THIRD INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION MANAGEMENT, VOLS 1 AND 2, 2008, : 187 - 194
  • [35] Efficient approximate query processing in peer-to-peer networks
    Arai, Benjamin
    Das, Gautam
    Gunopulos, Dimitrios
    Kalogeraki, Vana
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (07) : 919 - 933
  • [36] Efficient skyline query processing on peer-to-peer networks
    Wang, Shiyuan
    Ooi, Beng Chin
    Tung, Anthony K. H.
    Xu, Lizhen
    2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2007, : 1101 - +
  • [37] Efficient Hierarchical Quorums in Unstructured Peer-to-Peer Networks
    Henry, Kevin
    Swanson, Colleen
    Xie, Qi
    Daudjee, Khuzaima
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2009, PT 1, 2009, 5870 : 183 - 200
  • [38] An efficient clustering scheme in mobile peer-to-peer networks
    Zuo, Ke
    Hu, Dongmin
    Wang, Huaimin
    Wu, Quanyuan
    Su, Liang
    2008 THE INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING, 2008, : 292 - 296
  • [39] An Efficient Search Scheme in Unstructured Peer-to-Peer Networks
    Gong, Yadong
    Deng, Heping
    Gu, Zhanran
    Hu, Jiye
    Wen, Yongxiang
    MECHATRONICS AND INTELLIGENT MATERIALS, PTS 1 AND 2, 2011, 211-212 : 295 - +
  • [40] Efficient and Scalable Consistency Maintenance for Heterogeneous Peer-to-Peer Systems
    Li, Zhenyu
    Xie, Gaogang
    Li, Zhongcheng
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2008, 19 (12) : 1695 - 1708