Performance Comparison of Big Data Processing Utilizing SciDB and Apache Accumulo Databases

被引:1
作者
Abu Mhana, Mohammad [1 ]
Khalifeh, Ala' [1 ]
Alouneh, Sahel [1 ,2 ]
机构
[1] German Jordanian Univ, Amman, Jordan
[2] Al Ain Univ, Al Ain, U Arab Emirates
来源
2022 SEVENTH INTERNATIONAL CONFERENCE ON FOG AND MOBILE EDGE COMPUTING, FMEC | 2022年
关键词
Distributed Databases; Apache Accumulo; Hadoop; SciDB; PostgreSQL; Big Data;
D O I
10.1109/FMEC57183.2022.10062513
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Big data deals with processing massive, complex data sets and data volumes that incorporate a tremendous amount of information. Therefore, researchers created several methods, models, and databases to deal with such big data, among them is the Apache Accumulo database, which is considered an in-storage database reliant on the Hadoop processing framework to give the ability to analyze and process the data. Another big data database that is widely used in the research community is SciDB which stands for the scientific database. SciDB utilizes a PostgreSQL connection, to establish a reliable link with the database. In this paper, we will analyze and evaluate the performance of these two database systems that are specialized in handling big data and storing them for further processing and analysis. The databases' performance will be analyzed in terms of several metrics such as CPU utilization, data storing/retrieval delay, disk utilization, and the number of data ingestions per second. Furthermore, the setup and integration of the two databases are investigated. Our performance evaluation revealed the advantages and disadvantages of each database structure. Where it has been found that Apache Accumulo DB has the best performance compared with SciDB in terms of average ingestion execution time, the number of ingestions per second, and CPU utilization. Whereas, SciDB has the lowest disk utilization compared to Apache Accumulo.
引用
收藏
页码:17 / 21
页数:5
相关论文
共 30 条
  • [1] [Anonymous], 2015, ACCUMULO APPL DEV TA
  • [2] [Anonymous], P 4 ANN S
  • [3] [Anonymous], 2010, 2010 USENIX ANN TECH
  • [4] Bernaschi Massimo, 2017, ITALIAN C TRAFFIC PO
  • [5] Brown P.G., 2010, P 2010 ACM SIGMOD IN, P963, DOI [DOI 10.1145/1807167.1807271, 10.1145/1807167.1807271]
  • [6] A Demonstration of SciDB: A Science-Oriented DBMS
    Cudre-Mauroux, P.
    Kimura, H.
    Lim, K. -T.
    Rogers, J.
    Simakov, R.
    Soroush, E.
    Velikhov, P.
    Wang, D. L.
    Balazinska, M.
    Becla, J.
    DeWitt, D.
    Heath, B.
    Maier, D.
    Madden, S.
    Patel, J.
    Stonebraker, M.
    Zdonik, S.
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2009, 2 (02): : 1534 - 1537
  • [7] The BigDAWG Polystore System
    Duggan, Jennie
    Elmore, Aaron J.
    Stonebraker, Michael
    Balazinska, Magda
    Howe, Bill
    Kepner, Jeremy
    Madden, Sam
    Maier, David
    Mattson, Tim
    Zdonik, Stan
    [J]. SIGMOD RECORD, 2015, 44 (02) : 11 - 16
  • [8] Challenges of Big Data analysis
    Fan, Jianqing
    Han, Fang
    Liu, Han
    [J]. NATIONAL SCIENCE REVIEW, 2014, 1 (02) : 293 - 314
  • [9] Accelerating Scientific Analysis with SciDB
    Gerhardt, L.
    Faham, C. H.
    Yao, Y.
    [J]. 21ST INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP2015), PARTS 1-9, 2015, 664
  • [10] Giacomelli P, 2020, Arxiv, DOI arXiv:2003.11124