W-tree: A Compact External Memory Representation for Webgraphs

被引:0
作者
Avila, Bruno T. [1 ]
Lins, Rafael D. [2 ]
机构
[1] Univ Fed Pernambuco, Dept Informat Sci, BR-50670901 Recife, PE, Brazil
[2] Univ Fed Pernambuco, Informat Ctr, BR-50670901 Recife, PE, Brazil
关键词
Webgraph; representation; data structure; compression; external memory; GRAPH COMPRESSION; FAST ACCESS; WEB; MODELS;
D O I
10.1145/2835181
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
World Wide Web applications need to use, constantly update, and maintain large webgraphs for executing several tasks, such as calculating the web impact factor, finding hubs and authorities, performing link analysis by webometrics tools, and ranking webpages by web search engines. Such webgraphs need to use a large amount of main memory, and, frequently, they do not completely fit in, even if compressed. Therefore, applications require the use of external memory. This article presents a new compact representation for webgraphs, called w-tree, which is designed specifically for external memory. It supports the execution of basic queries (e.g., full read, random read, and batch random read), set-oriented queries (e.g., superset, subset, equality, overlap, range, inlink, and co-inlink), and some advanced queries, such as edge reciprocal and hub and authority. Furthermore, a new layout tree designed specifically for webgraphs is also proposed, reducing the overall storage cost and allowing the random read query to be performed with an asymptotically faster runtime in the worst case. To validate the advantages of the w-tree, a series of experiments are performed to assess an implementation of the w-tree comparing it to a compact main memory representation. The results obtained show that w-tree is competitive in compression time and rate and in query time, which may execute several orders of magnitude faster for set-oriented queries than its competitors. The results provide empirical evidence that it is feasible to use a compact external memory representation for webgraphs in real applications, contradicting the previous assumptions made by several researchers.
引用
收藏
页数:36
相关论文
共 43 条
[1]   Towards compressing Web graphs [J].
Adler, M ;
Mitzenmacher, M .
DCC 2001: DATA COMPRESSION CONFERENCE, PROCEEDINGS, 2001, :203-212
[2]   Internet -: Diameter of the World-Wide Web [J].
Albert, R ;
Jeong, H ;
Barabási, AL .
NATURE, 1999, 401 (6749) :130-131
[3]  
ALMIND TC, 1997, J DOC, V53, P4
[4]  
[Anonymous], 2007, P 16 INT C WORLD WID
[5]   Graph Compression by BFS [J].
Apostolico, Alberto ;
Drovandi, Guido .
ALGORITHMS, 2009, 2 (03) :1031-1044
[6]  
Asano Y, 2008, LECT NOTES COMPUT SC, V5092, P1
[7]  
Avila B. T., 2011, THESIS PUC RIO
[8]   Cache-oblivious B-trees [J].
Bender, MA ;
Demaine, ED ;
Farach-Colton, M .
SIAM JOURNAL ON COMPUTING, 2005, 35 (02) :341-358
[9]  
Berners-Lee Tim, 2006, Foundations and Trends in Web Science, V1, P1, DOI 10.1561/1800000001
[10]   The Connectivity Server: fast access to linkage information on the Web [J].
Bharat, K ;
Broder, A ;
Henzinger, M ;
Kumar, P ;
Venkatasubramanian, S .
COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7) :469-477