Extracting knowledge from the World Wide Web

被引:29
作者
Henzinger, M [1 ]
Lawrence, S [1 ]
机构
[1] Google Inc, Mountain View, CA 94043 USA
关键词
D O I
10.1073/pnas.0307528100
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The World Wide Web provides a unprecedented opportunity to automatically analyze a large sample of interests and activity in the world. We discuss methods for extracting knowledge from the web by randomly sampling and analyzing hosts and pages, and by analyzing the link structure of the web and how links accumulate over time. A variety of interesting and valuable information can be extracted, such as the distribution of web pages over domains, the distribution of interest in different areas, communities related to different topics, the nature of competition in different categories of sites, and the degree of communication between different communities or countries.
引用
收藏
页码:5186 / 5191
页数:6
相关论文
共 41 条
  • [1] Topology of evolving networks:: Local events and universality
    Albert, R
    Barabási, AL
    [J]. PHYSICAL REVIEW LETTERS, 2000, 85 (24) : 5234 - 5237
  • [2] Internet -: Diameter of the World-Wide Web
    Albert, R
    Jeong, H
    Barabási, AL
    [J]. NATURE, 1999, 401 (6749) : 130 - 131
  • [3] [Anonymous], 1996, SPECTRAL GRAPH THEOR
  • [4] [Anonymous], 1979, Computers and Intractablity: A Guide to the Theoryof NP-Completeness
  • [5] [Anonymous], GRAPH CLUSTERING TEC
  • [6] [Anonymous], P ANN M AM SOC INF S
  • [7] [Anonymous], 1998, Proceedings of the 7th international conference on World Wide Web (WWW), DOI [10.1016/S0169-7552(98)00110-X, DOI 10.1016/S0169-7552(98)00110-X]
  • [8] [Anonymous], P HUM FACT COMP SYST
  • [9] Mean-field theory for scale-free random networks
    Barabási, AL
    Albert, R
    Jeong, H
    [J]. PHYSICA A, 1999, 272 (1-2): : 173 - 187
  • [10] Emergence of scaling in random networks
    Barabási, AL
    Albert, R
    [J]. SCIENCE, 1999, 286 (5439) : 509 - 512