Efficient data sampling in heterogeneous peer-to-peer networks

被引:0
|
作者
Arai, Benjamin [1 ]
Lin, Song [1 ]
Gunopulos, Dimitrios [1 ]
机构
[1] Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92521 USA
来源
ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING | 2007年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Performing data-mining tasks such as clustering, classification, and prediction on large datasets is an arduous task and, many times, it is an infeasible task given current hardware limitations. The distributed nature of peer-to-peer databases further complicates this issue by introducing an access overhead cost in addition to the cost of sending individual tuples over the network. We propose a two-level sampling approach focusing on peer-to-peer databases for maximizing sample quality given a user-defined communication budget. Given that individual peers may have varying cardinality we propose an algorithm for determining the optimal sample rate (the percentage of tuples to sample from a peer)for each peer We do this by analyzing the variance of individual peers, ultimately minimizing the total variance of the entire sample. By performing local optimization of individual peer sample rates we maximize approximation accuracy of the samples. We also offer several techniques for sampling in peer-to-peer databases given various amounts of known and unknown information about the network and its peers.
引用
收藏
页码:23 / 32
页数:10
相关论文
共 50 条
  • [21] Distributed data mining in peer-to-peer networks
    Datta, Souptik
    Bhaduri, Kanishka
    Giannella, Chris
    Kargupta, Hillol
    Wolff, Ran
    IEEE INTERNET COMPUTING, 2006, 10 (04) : 18 - 26
  • [22] Data indexing in peer-to-peer DHT networks
    Garcés-Erice, L
    Felber, PA
    Biersack, EW
    Urvoy-Keller, G
    Ross, KW
    24TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, PROCEEDINGS, 2004, : 200 - 208
  • [23] Data replication mechanisms in the peer-to-peer networks
    Mohammadi, Behnaz
    Navimipour, Nima Jafari
    INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, 2019, 32 (14)
  • [24] Protactind sensitive data in peer-to-peer networks
    Nejdl, W
    Olmedilla, D
    IEEE INTELLIGENT SYSTEMS, 2004, 19 (05) : 82 - 85
  • [25] PathFinder: Efficient Lookups and Efficient Search in Peer-to-Peer Networks
    Bradler, Dirk
    Krumov, Lachezar
    Muhlhauser, Max
    Kangasharju, Jussi
    DISTRIBUTED COMPUTING AND NETWORKING, 2011, 6522 : 77 - +
  • [26] Efficient peer-to-peer data dissemination in mobile ad-hoc networks
    Goel, SK
    Singh, M
    Xu, DY
    Li, BC
    2002 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDINGS OF THE WORKSHOPS, 2002, : 152 - 158
  • [27] Efficient approximate query processing in peer-to-peer networks
    IEEE
    不详
    不详
    IEEE Trans Knowl Data Eng, 2007, 7 (919-933):
  • [28] Peer-to-Peer Networks
    Lin Yu1
    2. Peking University
    ZTECommunications, 2006, (01) : 53 - 57
  • [29] Optimally Efficient Multicast in Structured Peer-to-Peer Networks
    Bradler, Dirk
    Kangasharju, Jussi
    Muehlhaeuser, Max
    2009 6TH IEEE CONSUMER COMMUNICATIONS AND NETWORKING CONFERENCE, VOLS 1 AND 2, 2009, : 123 - +
  • [30] An efficient adaptive strategy for searching in peer-to-peer networks
    Gatani, Luca
    Lo Re, Giuseppe
    MULTIAGENT AND GRID SYSTEMS, 2005, 1 (03) : 209 - 224