Efficient Multi-threaded Crawling Using In Memory Data Structures

被引:0
|
作者
Abdeen, Mohammad A. R. [1 ]
机构
[1] Islamic Univ Madinah, Fac Comp & Informat Syst, Madinah, Saudi Arabia
关键词
Web Crawlers; Distributed Applications; Multi-threading; In-memory Data Structures; Performance Evaluation;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Crawling the internet is an important task for any search engine. A crawler is a software program that sends HTTP requests to various webservers available on the world datasphere and downloads their contents. As the size of the internet has gone through a big bang in the last decade, designing efficient parallel crawlers became a necessity. One of the factors that degrades the crawler performance is the disk access every time a file is written. As the process of crawling the web requires the download of tens or hundreds of millions of webpages, much time will be consumed in disk writes due to the seek times. This work presents an efficient multi-threaded crawler that incorporates an in-memory data structure to reduce the overall disk write times. The results show that the proposed technique can increase the throughput by about 50% at selected values of size of the in-memory data structure over the normal multi-threaded crawler with no in-memory data structure. In addition, the results show that this design can achieve an average crawler speed of 22 pages/sec which supersedes previously reported work.
引用
收藏
页码:88 / 92
页数:5
相关论文
共 50 条
  • [1] An efficient multi-threaded memory allocator for PDES applications
    Li, Tianlin
    Yao, Yiping
    Tang, Wenjie
    Zhu, Feng
    Lin, Zhongwei
    SIMULATION MODELLING PRACTICE AND THEORY, 2020, 100
  • [2] Memory management for multi-threaded software DSM systems
    Kee, YS
    Kim, JS
    Ha, S
    PARALLEL COMPUTING, 2004, 30 (01) : 121 - 138
  • [3] Multi-Threaded control of NAND Flash memory array
    Nubile, Luca
    De Santis, Luca
    Cardinali, Riccardo
    2021 IEEE WORKSHOP ON MICROELECTRONICS AND ELECTRON DEVICES (WMED), 2021, : 28 - 31
  • [4] Efficient Memory Arbitration in High-Level Synthesis From Multi-Threaded Code
    Cheng, Jianyi
    Fleming, Shane T.
    Chen, Yu Ting
    Anderson, Jason
    Wickerson, John
    Constantinides, George A.
    IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (04) : 933 - 946
  • [5] An Architecture for Safe and Efficient Multi-threaded Robot Software
    Kazanzides, Peter
    Deguet, Anton
    Kapoor, Ankur
    2008 IEEE INTERNATIONAL CONFERENCE ON TECHNOLOGIES FOR PRACTICAL ROBOT APPLICATIONS, 2008, : 89 - 93
  • [6] Multi-threaded reachability
    Sahoo, D
    Jain, J
    Iyer, SK
    Dill, DL
    Emerson, EA
    42ND DESIGN AUTOMATION CONFERENCE, PROCEEDINGS 2005, 2005, : 467 - 470
  • [7] A new concurrency control mechanism for multi-threaded environment using transactional memory
    Ghosh, Ammlan
    Chaki, Rituparna
    Chaki, Nabendu
    JOURNAL OF SUPERCOMPUTING, 2015, 71 (11): : 4095 - 4115
  • [8] A new concurrency control mechanism for multi-threaded environment using transactional memory
    Ammlan Ghosh
    Rituparna Chaki
    Nabendu Chaki
    The Journal of Supercomputing, 2015, 71 : 4095 - 4115
  • [9] Multi-Threaded Actors
    Azadbakht, Keyvan
    de Boer, Frank S.
    Serbanescu, Vlad
    ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2016, (223): : 51 - 66
  • [10] AggrePlay: Efficient Record and Replay of Multi-threaded Programs
    Pobee, Ernest
    Chan, W. K.
    ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, : 567 - 577