Studying the XML Web: Gathering statistics from an XML sample

被引:27
作者
Barbosa, D
Mignet, L
Veltri, P
机构
[1] Univ Toronto, Dept Comp Sci, Toronto, ON M5S 3G5, Canada
[2] IBM India Res Lab, New Delhi 110016, India
[3] Magna Graecia Univ Catanzaro, Dept Expt & Clin Med, I-88100 Catanzaro, Italy
[4] INRIA, Paris, France
来源
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS | 2005年 / 8卷 / 04期
关键词
World Wide Web; XML; XML web; XML Documents; XML processing tools;
D O I
10.1007/s11280-005-1544-y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
XML has emerged as the language for exchanging data on the web and has attracted considerable interest both in industry and in academia. Nevertheless, to date, little is known about the XML documents published on the web. This paper presents a comprehensive analysis of a sample of about 200,000 XML documents on the web, and is the first study of its kind. We study the distribution of XML documents across the web in several ways; moreover, we provided a detailed characterization of the structure of real XML documents. Our results provide valuable input to the design of algorithms, tools and systems that use XML in one form or another.
引用
收藏
页码:413 / 438
页数:26
相关论文
共 39 条
  • [1] ABITEBOUL S, 1999, DATA WEB
  • [2] ABITEBOUL S, 1997, P INT C DAT T ICDT
  • [3] ABITEBOUL S, 2003, P INT WWW C
  • [4] Views in a large-scale XML repository
    Aguilera, V
    Cluet, S
    Milo, T
    Veltri, P
    Vodislav, D
    [J]. VLDB JOURNAL, 2002, 11 (03) : 238 - 255
  • [5] APPARAO V, 1998, DOCUMENT OBJECT MODE
  • [6] Efficient incremental validation of XML documents
    Barbosa, D
    Mendelzon, AO
    Libkin, L
    Mignet, L
    Arenas, M
    [J]. 20TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2004, : 671 - 682
  • [7] BARBOSA L, P BRAZ S DAT
  • [8] Bex G.J., 2004, PROC INT WORKSHOP WE, P79, DOI DOI 10.1145/1017074.1017095
  • [9] BOHANNON P, 2002, P INT C DAT ENG ICDE
  • [10] Bray T., 2004, Extensible Markup Language