Logical Schema for Data Warehouse on Column-Oriented NoSQL Databases

被引:18
作者
Boussahoua, Mohamed [1 ]
Boussaid, Omar [1 ]
Bentayeb, Fadila [1 ]
机构
[1] Univ Lumiere Lyon 2, ERIC, EA 3083, 5 Ave Pierre Mendes France, F-69676 Bron, France
来源
DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2017, PT II | 2017年 / 10439卷
关键词
Data warehouses; NoSQL databases; Columns family;
D O I
10.1007/978-3-319-64471-4_20
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The column-oriented NoSQL systems propose a flexible and highly denormalized data schema that facilitates data warehouse scalability. However, the implementation process of data warehouses with NoSQL databases is a challenging task as it involves a distributed data management policy on multi-nodes clusters. Indeed, in column-oriented NoSQL systems, the query performances can be improved by a careful data grouping. In this paper, we present a method that uses clustering techniques, in particular k-means, to model the better form of column families, from existing fact and dimensional tables. To validate our method, we adopt TPC-DS data benchmark. We have conducted several experiments to examine the benefits of clustering techniques for the creation of column families in a column-oriented NoSQL HBase database on Hadoop platform. Our experiments suggest that defining a good data grouping on HBase database during the implementation of a data warehouse increases significantly the performance of the decisional queries.
引用
收藏
页码:247 / 256
页数:10
相关论文
共 13 条
  • [1] Abell A., 2011, P ACM 14 INT WORKSH, P17
  • [2] Brewer Eric A., 2000, Towards Robust Distributed Systems (Invited Talk), V7
  • [3] Implementation of Multidimensional Databases in Column-Oriented NoSQL Systems
    Chevalier, Max
    El Malki, Mohammed
    Kopliku, Arlind
    Teste, Olivier
    Tournier, Ronan
    [J]. ADVANCES IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2015, 2015, 9282 : 79 - 91
  • [4] Chongxin Li, 2010, 2010 IEEE International Conference on Software Engineering and Service Sciences (ICSESS 2010), P683, DOI 10.1109/ICSESS.2010.5552465
  • [5] Dehdouh K., 2015, International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'15), P469
  • [6] DERRAR H., 2015, Encyclopedia of Information Science and Technology, P1949, DOI [10.4018/978-1-4666-5888-2.ch188, DOI 10.4018/978-1-4666-5888-2.CH188]
  • [7] MacQueen, 1967, BERK S MATH STAT PRO, DOI DOI 10.1007/S11665-016-2173-6
  • [8] VERTICAL PARTITIONING ALGORITHMS FOR DATABASE DESIGN
    NAVATHE, S
    CERI, S
    WIEDERHOLD, G
    DOU, J
    [J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 1984, 9 (04): : 680 - 710
  • [9] Padhy R. P., 2011, INT J ADV ENG SCI TE, V11, P15
  • [10] Physical Data Warehouse Design on NoSQL Databases OLAP Query Processing over HBase
    Scabora, Lucas C.
    Brito, Jaqueline J.
    Ciferri, Ricardo Rodrigues
    de Aguiar Ciferri, Cristina Dutra
    [J]. PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL 1 (ICEIS), 2016, : 111 - 118