Design and analyses of web scraping on burstable virtual machines

被引:0
|
作者
Drummond, Lucia Maria A. [1 ]
Andrade, Luciano [1 ]
Muniz, Pedro de Brito [1 ]
Pereira, Matheus Marotti [1 ]
Silva, Thiago do Prado [1 ]
Teylo, Luan [1 ,2 ]
机构
[1] Univ Fed Fluminense UFF, Inst Computacao, Niteroi, Brazil
[2] INRIA, Bordeaux, France
来源
关键词
Burstable instances; cloud computing; web scraping;
D O I
10.1002/cpe.7999
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Web scraping is a widely used technique for decision-making, collecting, and structuring public data from the internet. As the volume of data continues to grow, the need for more efficient methods of data extraction becomes crucial. This article introduces a novel web scraping framework that utilizes Burstable virtual machines (VMs) on Amazon Web Services with the objective of reducing the monetary cost of execution while ensuring compliance with service level agreements (SLAs). To achieve this, the framework utilizes a combination of fixed and temporary Burstable VMs in a mixed cluster, which can be elastically scaled up to fulfill the SLA and scaled down to minimize monetary costs. Two strategies for handling VM allocation are proposed and evaluated: (i) a queue and SLA-based strategy that employs queue size information and SLA criteria to determine the required number of VMs for the current scraping requests, and (ii) a credit-based strategy that incorporates information about Burstable VM credits to effectively manage instance creation and termination. Experimental tests show that the proposed framework meets the defined SLA while achieving cost reductions of up to 74% compared to an approach that executes on fixed-size clusters of Burstable instances.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Scheduling Bag-of-Tasks in Clouds Using Spot and Burstable Virtual Machines
    Teylo, Luan
    Arantes, Luciana
    Sens, Pierre
    Drummond, Lucia Maria de A.
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2023, 11 (01) : 984 - 996
  • [2] Web scraping proxy
    Katseff, HP
    DR DOBBS JOURNAL, 2003, 28 (06): : 46 - +
  • [3] Web Scraping for Astronomy
    Derriere, S.
    Boch, T.
    ASTRONOMICAL DATA ANALYSIS SOFTWARE AND SYSTEMS XXI, 2012, 461 : 319 - 322
  • [4] Anwendungen des Web Scraping in der amtlichen StatistikApplications for web scraping in official statistics
    Heidi Kühnemann
    AStA Wirtschafts- und Sozialstatistisches Archiv, 2021, 15 (1) : 5 - 25
  • [5] Web Scraping Using R
    Bradley, Alex
    James, Richard J. E.
    ADVANCES IN METHODS AND PRACTICES IN PSYCHOLOGICAL SCIENCE, 2019, 2 (03) : 264 - 270
  • [6] Scraping the demos. Digitalization, web scraping and the democratic project
    Ulbricht, Lena
    DEMOCRATIZATION, 2020, 27 (03) : 426 - 442
  • [7] Effective Web Scraping with OXPath
    Grasso, Giovanni
    Furche, Tim
    Schallhart, Christian
    PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'13 COMPANION), 2013, : 23 - 25
  • [8] A SEMANTIC SCRAPING MODEL FOR WEB RESOURCES Applying Linked Data to Web Page Screen Scraping
    Ignacio Fernandez-Villamor, Jose
    Blasco-Garcia, Jacobo
    Iglesias, Carlos A.
    Garijo, Mercedes
    ICAART 2011: PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 2, 2011, : 451 - 456
  • [9] CernVM WebAPI - Controlling Virtual Machines from the Web
    Charalampidis, I.
    Berzano, D.
    Blomer, J.
    Buncic, P.
    Ganis, G.
    Meusel, R.
    Segal, B.
    21ST INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP2015), PARTS 1-9, 2015, 664
  • [10] Phishing Web Page Detection using Web Scraping
    Boyapati, Mallika
    Aygun, Ramazan
    SOUTHEASTCON 2023, 2023, : 167 - 174