The Performance Analysis of Distributed Storage Systems Used in Scalable Web Systems

被引:0
|
作者
Oles, Dominik [1 ]
Nowak, Ziemowit [2 ]
机构
[1] Tieto Czech Sro, 28 Rijna 3346-91, Ostrava 70200, Czech Republic
[2] Wroclaw Univ Sci & Technol, Fac Comp Sci & Management, Wybrzeze Wyspianskiego 27, PL-50370 Wroclaw, Poland
来源
INFORMATION SYSTEMS ARCHITECTURE AND TECHNOLOGY, ISAT 2018, PT I | 2019年 / 852卷
关键词
Big Data; Hadoop; HBase; Kudu;
D O I
10.1007/978-3-319-99981-4_27
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scalable web systems are directly related to distributed storage systems used to process large amounts of data (big data). An example of such a system is Hadoop with its many extensions supporting data storage such as SQL-on-Hadoop systems and the "Parquet" file format. Another kind of systems for storing and processing big data are NoSQL databases, such as HBase, which are used in applications requiring fast random access. The Kudu system was created to combine the advantages of Hadoop and HBase and enable both effective data set analysis and fast random access. As subject of the research, performance analysis of the mentioned systems was performed. The experiment was conducted in the Amazon Web Services public cloud environment, where the cluster of nine virtual machines was configured. For research purpose, containing about billion rows fragment of "Wikipedia Page Traffic Statistics" public dataset was used. The results of the measurements confirm that the Kudu system is a promising alternative to the commonly used technologies.
引用
收藏
页码:287 / 298
页数:12
相关论文
共 50 条
  • [41] Efficient data storage: adaptively changing chunk size in cloud computing storage systems
    Baya, Chalabi
    Yahya, Slimani
    INTERNATIONAL JOURNAL OF GRID AND UTILITY COMPUTING, 2023, 14 (05) : 516 - 525
  • [42] AR-RRNS: Configurable reliable distributed data storage systems for Internet of Things to ensure security
    Chervyakov, Nikolay
    Babenko, Mikhail
    Tchernykh, Andrei
    Kucherov, Nikolay
    Miranda-Lopez, Vanessa
    Cortes-Mendoza, Jorge M.
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 92 (1080-1092): : 1080 - 1092
  • [43] Towards Distributed Cognitive Expert Systems
    Tofangchi, Schahin
    Hanelt, Andre
    Kolbe, Lutz M.
    DESIGNING THE DIGITAL TRANSFORMATION, DESRIST 2017, 2017, 10243 : 145 - 159
  • [44] POSSIBILITIES OF DISTRIBUTED SYSTEMS IN GRID AND CLOUD
    Svec, Peter
    9TH INTERNATIONAL CONFERENCE ON EDUCATION AND NEW LEARNING TECHNOLOGIES (EDULEARN17), 2017, : 5573 - 5578
  • [45] Energy Efficiency Evaluation of Distributed Systems
    Phung, James
    Lee, Young Choon
    Zomaya, Albert Y.
    COMPUTATIONAL SCIENCE - ICCS 2019, PT V, 2019, 11540 : 756 - 763
  • [46] A Deadline Scheduler for Jobs in Distributed Systems
    Perret, Quentin
    Charlemagne, Gabriel
    Sotiriadis, Stelios
    Bessis, Nik
    2013 IEEE 27TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS WORKSHOPS (WAINA), 2013, : 757 - 764
  • [47] Enumerating Trillion Subgraphs On Distributed Systems
    Park, Ha-Myung
    Silvestri, Francesco
    Pagh, Rasmus
    Chung, Chin-Wan
    Myaeng, Sung-Hyon
    Kang, U.
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2018, 12 (06)
  • [48] Modeling and Research of Processing Big Data Sets in Distributed Information Systems
    Klymash, Mykhailo
    Hordiichuk-Bublivska, Olena
    Tchaikovskyi, Ihor
    Deschynskiy, Yuriy
    15TH INTERNATIONAL CONFERENCE ON ADVANCED TRENDS IN RADIOELECTRONICS, TELECOMMUNICATIONS AND COMPUTER ENGINEERING (TCSET - 2020), 2020, : 858 - 863
  • [49] Data Storage in Smart Grid Systems
    Yilmaz, Ercan Nurcan
    Polat, Huseyin
    Oyucu, Saadin
    Aksoz, Ahmet
    Saygin, Ali
    2018 6TH INTERNATIONAL ISTANBUL SMART GRIDS AND CITIES CONGRESS AND FAIR (ICSG ISTANBUL 2018), 2018, : 110 - 113
  • [50] A High-Performance Distributed Relational Database System for Scalable OLAP Processing
    Arnold, Jason
    Glavic, Boris
    Raicu, Ioan
    2019 IEEE 33RD INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2019), 2019, : 738 - 748