Codes for the World Wide Web

被引:25
作者
Boldi, Paolo [1 ]
Vigna, Sebastiano [1 ]
机构
[1] Univ Studi Milano, Dipartimento Sci Informazione, Via Comelico 39-41, I-20135 Milan, Italy
关键词
D O I
10.1080/15427951.2005.10129113
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We introduce a new family of simple, complete instantaneous codes for positive integers, called zeta codes, which are suitable for integers distributed as a power law with small exponent (smaller than 2). The main motivation for the introduction of zeta codes comes from web-graph compression: if nodes are numbered according to URL lexicographical order, gaps in successor lists are distributed according to a power law with small exponent. We give estimates of the expected length of zeta codes against power-law distributions, and compare the results with analogous estimates for the more classical gamma delta and variable-length block codes.
引用
收藏
页码:407 / 429
页数:23
相关论文
共 11 条
[1]   Towards compressing Web graphs [J].
Adler, M ;
Mitzenmacher, M .
DCC 2001: DATA COMPRESSION CONFERENCE, PROCEEDINGS, 2001, :203-212
[2]   UbiCrawler: a scalable fully distributed Web crawler [J].
Boldi, P ;
Codenotti, B ;
Santini, M ;
Vigna, S .
SOFTWARE-PRACTICE & EXPERIENCE, 2004, 34 (08) :711-726
[3]  
Boldi Paolo, 2004, P 13 INT C ONWORLD W, P595
[4]   Graph structure in the Web [J].
Broder, A ;
Kumar, R ;
Maghoul, F ;
Raghavan, P ;
Rajagopalan, S ;
Stata, R ;
Tomkins, A ;
Wiener, J .
COMPUTER NETWORKS-THE INTERNATIONAL JOURNAL OF COMPUTER AND TELECOMMUNICATIONS NETWORKING, 2000, 33 (1-6) :309-320
[5]  
Graham R.L., 1994, CONCRETE MATH, Vsecond
[6]  
LEVENSHTEIN VI, 1968, PROBL CYBERN, V20, P173
[7]  
Moffat A, 1995, LECT NOTES COMPUT SC, V955, P393
[8]  
Molteni Giuseppe, 2004, CLASS LACUNARY SERIE
[9]  
Randall Keith, 2001, 175 COMPAQ SYSTEMS R
[10]  
Rudin W., 1986, REAL COMPLEX ANAL, V3