The hierarchical agglomerative clustering with Gower index: A methodology for automatic design of OLAP cube in ecological data processing context

被引:12
作者
Sautot, Lucile [1 ,3 ,4 ]
Faivre, Bruno [1 ]
Journaux, Ludovic [2 ]
Molin, Paul [3 ]
机构
[1] Univ Bourgogne, UMR CNRS UB Biogeosci 6282, F-21000 Dijon, France
[2] Univ Bourgogne, UFR Sci & Tech, Lab Informat Elect & Image, F-21000 Dijon, France
[3] Agrosup Dijon, DSIP, F-21000 Dijon, France
[4] AgroParisTech, F-75732 Paris, France
关键词
OLAP; Hierarchical agglomerative clustering; Bird population; Automatic design; MULTIDIMENSIONAL DATA MODEL; EXPLORATION; TECHNOLOGY;
D O I
10.1016/j.ecoinf.2014.07.011
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
The OLAP systems can be an improvement for ecological studies. In fact, ecology studies, follows and analyzes phenomenon across space and time and according to several parameters. OLAP systems can provide to ecologists browsing in a large dataset One focus of the current research on OIAP system is the automatic design of OLAP cubes and of data warehouse schemas. This kind of works makes accessible OLAP technology to non information technology experts. But to be efficient, the automatic OLAP building must take into account various cases. Moreover the OLAP technology is based on the concept of hierarchy. Thereby the hierarchical clustering methods are often used by OLAP system designer. In this article, we propose using hierarchical agglomerative clustering with a metric that comes from ecological studies (the Gower similarity index) to build automatically hierarchical dimensions in an OLAP cube. With this similarity index we can perform a hierarchical clustering on heterogeneous datasets that contains qualitative and quantitative variables. We offer a prototypical automatic system which builds dimension for an OLAP cube and we measure the performances of this system according to the number of clustered individuals and according to the number of variables used for clustering. Thanks to these measures we can offer an approximation of performances with a large dataset 'thereby the Gower index in a hierarchical agglomerative clustering permits the management of heterogeneous datatet with missing values in a context of automatic building of OLAP cube. With this methodology, we can build new dimensions based on hierarchies in the data, which are not evident. The data mining methods can complete the expert knowledge during the design of an OLAP cube, because these methods can explain the inherent structure of the data. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:217 / 230
页数:14
相关论文
共 47 条
[1]  
Abdelhedi F., 2011, 13 INT C ENTR INF SY
[2]  
[Anonymous], MACHINE PERCEPTION A
[3]  
Bache K, 2013, UCI machine learning repository
[4]  
Bentayeb F., 2013, P 15 INT C ENT INF S, P160
[5]  
Bentayeb F, 2008, ICEIS 2008: PROCEEDINGS OF THE TENTH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL DISI, P531
[6]  
Bimonte S., 2013, P 16 INT WORKSH DAT, P61, DOI DOI 10.1145/2513190.2513199
[7]   When Spatial Analysis Meets OLAP: Multidimensional Model and Operators [J].
Bimonte, Sandro ;
Tchounikine, Anne ;
Miquel, Maryvonne ;
Pinet, Francois .
INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2010, 6 (04) :33-60
[8]  
Blondel J., 1981, Studies in Avian Biology, P414
[9]   Guaranteeing the quality of multidimensional analysis in data warehouses of simulation results: Application to pesticide transfer data produced by the MACRO model [J].
Boulil, Kamal ;
Pinet, Francois ;
Bimonte, Sandra ;
Carluer, Nadia ;
Lauvernet, Claire ;
Cheviron, Bruno ;
Miralles, Andre ;
Chanet, Jean-Pierre .
ECOLOGICAL INFORMATICS, 2013, 16 :41-52
[10]  
Ceci M, 2011, LECT NOTES COMPUT SC, V6882, P559, DOI 10.1007/978-3-642-23863-5_57