BRDPHHC: A Balance RDF Data Partitioning Algorithm based on Hybrid Hierarchical Clustering

被引:1
作者
Leng, Yonglin [1 ]
Chen, Zhikui [1 ]
Zhong, Fangming [1 ]
Zhong, Hua [1 ]
机构
[1] Dalian Univ Technol, Sch Software Technol, Dalian 116620, Peoples R China
来源
2015 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2015 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CYBERSPACE SAFETY AND SECURITY, AND 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS) | 2015年
关键词
RDF; data partitioning; hybrid hierarchical clustering;
D O I
10.1109/HPCC-CSS-ICESS.2015.190
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Data partitioning is a fundamental step to achieve effective storage and query of RDF big data. This paper presents a balance RDF data partitioning algorithm based on hybrid hierarchical clustering (BRDPHHC), which combines AP and K-means clustering. BRDPHHC's functionality includes three aspects: (i) a pre-processing step combining nodes compression and nodes remove to reduce the scale of raw data points, (ii) AP clustering algorithm is used to coarsen the RDF graph step by step and produce data blocks, and (iii) K-means algorithm is used for data partitioning finally. Experiments on benchmark datasets demonstrate the effectiveness of the proposed scheme.
引用
收藏
页码:1755 / 1760
页数:6
相关论文
共 15 条
[1]  
[Anonymous], 2007, Dbpedia: A nucleus for a web of open data
[2]  
[Anonymous], 2008, ACM International Conference on Multimedia (MM)
[3]  
[杜方 Du Fang], 2013, [软件学报, Journal of Software], V24, P1222
[4]   Clustering by passing messages between data points [J].
Frey, Brendan J. ;
Dueck, Delbert .
SCIENCE, 2007, 315 (5814) :972-976
[5]   LUBM: A benchmark for OWL knowledge base systems [J].
Guo, YB ;
Pan, ZX ;
Heflin, J .
JOURNAL OF WEB SEMANTICS, 2005, 3 (2-3) :158-182
[6]   A CARTESIAN PARALLEL NESTED DISSECTION ALGORITHM [J].
HEATH, MT ;
RAGHAVAN, P .
SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 1995, 16 (01) :235-253
[7]  
Huang JW, 2011, PROC VLDB ENDOW, V4, P1123
[8]   Efficient and Customizable Data Partitioning Framework for Distributed Big RDF Data Processing in the Cloud [J].
Lee, Kisung ;
Liu, Ling ;
Tang, Yuzhe ;
Zhang, Qi ;
Zhou, Yang .
2013 IEEE SIXTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD 2013), 2013, :327-334
[9]   Multilevel core-sets based aggregation clustering algorithm [J].
Ma, Ru-Ning ;
Wang, Xiu-Li ;
Ding, Jun-Di .
Ruan Jian Xue Bao/Journal of Software, 2013, 24 (03) :490-506
[10]   PARTITIONING SPARSE MATRICES WITH EIGENVECTORS OF GRAPHS [J].
POTHEN, A ;
SIMON, HD ;
LIOU, KP .
SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 1990, 11 (03) :430-452