Efficient Multi-threaded Crawling Using In Memory Data Structures

被引:0
|
作者
Abdeen, Mohammad A. R. [1 ]
机构
[1] Islamic Univ Madinah, Fac Comp & Informat Syst, Madinah, Saudi Arabia
关键词
Web Crawlers; Distributed Applications; Multi-threading; In-memory Data Structures; Performance Evaluation;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Crawling the internet is an important task for any search engine. A crawler is a software program that sends HTTP requests to various webservers available on the world datasphere and downloads their contents. As the size of the internet has gone through a big bang in the last decade, designing efficient parallel crawlers became a necessity. One of the factors that degrades the crawler performance is the disk access every time a file is written. As the process of crawling the web requires the download of tens or hundreds of millions of webpages, much time will be consumed in disk writes due to the seek times. This work presents an efficient multi-threaded crawler that incorporates an in-memory data structure to reduce the overall disk write times. The results show that the proposed technique can increase the throughput by about 50% at selected values of size of the in-memory data structure over the normal multi-threaded crawler with no in-memory data structure. In addition, the results show that this design can achieve an average crawler speed of 22 pages/sec which supersedes previously reported work.
引用
收藏
页码:88 / 92
页数:5
相关论文
共 50 条
  • [41] A Fast Profiler for Compilation of Multi-Threaded Applications on a Hybrid Memory System
    Dadzie, Thomas Haywood
    Cho, SeungPyo
    Oh, Hyunok
    2017 IEEE 6TH NON-VOLATILE MEMORY SYSTEMS AND APPLICATIONS SYMPOSIUM (NVMSA 2017), 2017,
  • [42] Efficient Transaction-Based Deterministic Replay for Multi-threaded Programs
    Pobee, Ernest
    Mei, Xiupei
    Chan, W. K.
    34TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE 2019), 2019, : 772 - 783
  • [43] Evaluation of Multi-Threaded Processor Designs for Energy Efficient Embedded Systems
    Zhang, Ran
    Guo, Hui
    2013 IEEE 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2013), 2013, : 619 - 626
  • [44] Energy-Efficient Server Consolidation for Multi-threaded Applications in the Cloud
    Hankendi, Can
    Coskun, Ayse K.
    2013 INTERNATIONAL GREEN COMPUTING CONFERENCE (IGCC), 2013,
  • [45] BinCFP: Efficient Multi-threaded Binary Code Control Flow Profiling
    Ming, Jiang
    Wu, Dinghao
    2016 IEEE 16TH INTERNATIONAL WORKING CONFERENCE ON SOURCE CODE ANALYSIS AND MANIPULATION (SCAM), 2016, : 61 - 66
  • [46] Extending Security for Multi-Threaded Servers
    Simmons, Sharon
    Edwards, Dennis
    WMSCI 2008: 12TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL I, PROCEEDINGS, 2008, : 140 - 144
  • [47] Security Check for Multi-threaded Programs
    Tri Minh Ngo
    Tuan Van Nguyen
    2016 IEEE SIXTH INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS (ICCE), 2016, : 465 - 470
  • [48] A Multi-Threaded Semantic Focused Crawler
    Punam Bedi
    Anjali Thukral
    Hema Banati
    Abhishek Behl
    Varun Mendiratta
    Journal of Computer Science and Technology, 2012, 27 : 1233 - 1242
  • [49] A Multi-threaded Version of Field II
    Jensen, Jorgen Arendt
    2014 IEEE INTERNATIONAL ULTRASONICS SYMPOSIUM (IUS), 2014, : 2229 - 2232
  • [50] On-line multi-threaded scheduling
    Feuerstein, E
    Mydlarz, M
    Stougie, L
    JOURNAL OF SCHEDULING, 2003, 6 (02) : 167 - 181