Using Web Pages Dynamicity to Prioritise Web Crawling

被引:1
|
作者
Alderratia, Nisreen [1 ]
Elsheh, Mohammed [1 ]
机构
[1] Libyan Acad Misurata, Third Ring Rd, Misurata, Libya
来源
PROCEEDINGS OF THE 2019 2ND INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND MACHINE INTELLIGENCE (MLMI 2019) | 2019年
关键词
Web crawler; importance metric; dynamicity;
D O I
10.1145/3366750.3366757
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Web crawling is a process performed to collect web pages from the web, in order to be indexed and used for displaying the search results according to users' requirements. In addition, web crawlers must continually revisit web pages, to keep the search engine database updated. Moreover, it is fundamental to determine in the crawling process, the most important pages to be recrawled first. This is to avoid the time limitation and network issues that face the web crawling process. Thus, this research attempts to introduce a method that is used to indicate the crawler, specifically, in order to identify in what order it should recrawl web pages that have been crawled before, as to acquire more important and valuable pages earlier than others. In addition, the researchers proposed a web crawling strategy which is based on the topic similarity, accompanied with the dynamicity of web pages, where the crawler was downloading relevant pages and recrawling them recursively. Also, every time a change emerged in one of the pages, its counter increased. Therefore, if the page was relevant and changed frequently it would be considered an important page and was given a high priority in the crawling process. The obtained results indicated that using web pages' dynamicity is an effective way for prioritising web pages in the crawling process, in order to obtain the highest dynamic pages first, as there is a high possibility of being changed in terms of their content, before the least dynamic ones.
引用
收藏
页码:40 / 44
页数:5
相关论文
共 50 条
  • [41] Culturally appropriate web interface design: Web crawler study
    Kondratova, Irina
    Goldfarb, Ilia
    Gervais, Roger
    Fournier, Luc
    PROCEEDINGS OF THE EIGHTH IASTED INTERNATIONAL CONFERENCE ON COMPUTERS AND ADVANCED TECHNOLOGY IN EDUCATION, 2005, : 359 - 364
  • [42] Combating Web Tracking: Analyzing Web Tracking Technologies for User Privacy
    Sim, Kyungmin
    Heo, Honyeong
    Cho, Haehyun
    FUTURE INTERNET, 2024, 16 (10)
  • [43] The Anatomy of Web Crawlers
    Sharma, Shruti
    Gupta, Parul
    2015 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION & AUTOMATION (ICCCA), 2015, : 849 - 853
  • [44] Performance Modeling of a Distributed Web Crawler Using Stochastic Activity Networks
    Nasri, Mitra
    Shariati, Saeed
    Azgomi, Mohammad Abdollahi
    ADVANCES IN COMPUTER SCIENCE AND ENGINEERING, 2008, 6 : 535 - 542
  • [45] Using Web Mining in the Analysis of Housing Prices: A Case study of Tehran
    Annamoradnejad, Rahimberdi
    Safarrad, Taher
    Annamoradnejad, Issa
    Habibi, Jafar
    2019 5TH INTERNATIONAL CONFERENCE ON WEB RESEARCH (ICWR), 2019, : 55 - 60
  • [46] Building an Efficient Web Portal for Students at Institutions of Higher Education Based on Web Crawlers
    Yi, Haibo
    Nie, Zhe
    2017 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL SYSTEMS AND COMMUNICATIONS (ICCSC 2017), 2017, : 96 - 100
  • [47] Development of a scalable web crawler
    Takano, H
    Kubo, N
    NEC RESEARCH & DEVELOPMENT, 1999, 40 (03): : 334 - 339
  • [48] Keyword Focused Web Crawler
    Agre, Gunjan H.
    Mahajan, Nikita V.
    2015 2ND INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION SYSTEMS (ICECS), 2015, : 1089 - 1092
  • [49] Design of the Distributed Web Crawler
    Chen, Xing
    Li, Weijiang
    Zhao, Tiejun
    Piao, Xinghai
    ADVANCED RESEARCH ON INDUSTRY, INFORMATION SYSTEMS AND MATERIAL ENGINEERING, PTS 1-7, 2011, 204-210 : 1454 - +