Efficient Multi-threaded Crawling Using In Memory Data Structures

被引:0
|
作者
Abdeen, Mohammad A. R. [1 ]
机构
[1] Islamic Univ Madinah, Fac Comp & Informat Syst, Madinah, Saudi Arabia
关键词
Web Crawlers; Distributed Applications; Multi-threading; In-memory Data Structures; Performance Evaluation;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Crawling the internet is an important task for any search engine. A crawler is a software program that sends HTTP requests to various webservers available on the world datasphere and downloads their contents. As the size of the internet has gone through a big bang in the last decade, designing efficient parallel crawlers became a necessity. One of the factors that degrades the crawler performance is the disk access every time a file is written. As the process of crawling the web requires the download of tens or hundreds of millions of webpages, much time will be consumed in disk writes due to the seek times. This work presents an efficient multi-threaded crawler that incorporates an in-memory data structure to reduce the overall disk write times. The results show that the proposed technique can increase the throughput by about 50% at selected values of size of the in-memory data structure over the normal multi-threaded crawler with no in-memory data structure. In addition, the results show that this design can achieve an average crawler speed of 22 pages/sec which supersedes previously reported work.
引用
收藏
页码:88 / 92
页数:5
相关论文
共 50 条
  • [21] SAC—A Functional Array Language for Efficient Multi-threaded Execution
    Clemens Grelck
    Sven-Bodo Scholz
    International Journal of Parallel Programming, 2006, 34 : 383 - 427
  • [22] Multi-threaded Cluster Shared Memory Folding Compression Method for Distribution Network Monitoring Data
    Qu Z.
    Hong Y.
    Wang Z.
    Zhongguo Dianji Gongcheng Xuebao/Proceedings of the Chinese Society of Electrical Engineering, 2021, 41 (03): : 921 - 931
  • [23] SniP: An Efficient Stack Tracing Framework for Multi-threaded Programs
    Arun, K. P.
    Kumar, Saurabh
    Mishra, Debadatta
    Panda, Biswabandan
    2022 MINING SOFTWARE REPOSITORIES CONFERENCE (MSR 2022), 2022, : 408 - 412
  • [24] Efficient Predictive Analysis for Detecting Nondeterminism in Multi-Threaded Programs
    Sinha, Arnab
    Malik, Sharad
    Gupta, Aarti
    PROCEEDINGS OF THE 12TH CONFERENCE ON FORMAL METHODS IN COMPUTER-AIDED DESIGN (FMCAD 2012), 2012, : 6 - 15
  • [25] Automatic and efficient false sharing avoider for multi-threaded programs
    Zheng, Dongying
    ICIC Express Letters, Part B: Applications, 2016, 7 (11): : 2405 - 2410
  • [26] EASY: Efficient Arbiter SYnthesis from Multi-threaded Code
    Cheng, Jianyi
    Fleming, Shane T.
    Chen, Yu Ting
    Anderson, Jason H.
    Constantinides, George A.
    PROCEEDINGS OF THE 2019 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'19), 2019, : 142 - 151
  • [27] Multi-Threaded Graph Partitioning
    LaSalle, Dominique
    Karypis, George
    IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013), 2013, : 225 - 236
  • [28] A multi-threaded asynchronous language
    Paulino, H
    Marques, P
    Lopes, L
    Vasconcelos, V
    Silva, F
    PARALLEL COMPUTING TECHNOLOGIES, PROCEEDINGS, 2003, 2763 : 316 - 323
  • [29] A multi-threaded version of MCFM
    John M. Campbell
    R. Keith Ellis
    Walter T. Giele
    The European Physical Journal C, 2015, 75
  • [30] A multi-threaded version of MCFM
    Campbell, John M.
    Ellis, R. Keith
    Giele, Walter T.
    EUROPEAN PHYSICAL JOURNAL C, 2015, 75 (06):