A Content-Driven ETL Processes for Open Data

被引:3
作者
Berro, Alain [1 ]
Megdiche, Imen [2 ]
Teste, Olivier [3 ]
机构
[1] Univ Toulouse 1, Manufacture Tabacs, F-31042 Toulouse, France
[2] Univ Toulouse 3, IRIT, F-31062 Toulouse, France
[3] Univ Toulouse 2, IUT Blagnac, F-31058 Toulouse, France
来源
NEW TRENDS IN DATABASE AND INFORMATION SYSTEMS II | 2015年 / 312卷
关键词
Open Data; ETL; Graphs; Self-Service BI; Hierarchical classification; Data warehouse;
D O I
10.1007/978-3-319-10518-5_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The emergent statistical Open Data (OD) seems very promising to generate various analysis scenarios for decision-making systems. Nevertheless, OD has problematic characteristics such as semantic and structural heterogeneousness, lack of schemas, autonomy and dispersion. These characteristics shakes the traditional Extract-Transform-Load (ETL) processes since these latter generally deal with well structured schemas. We propose in this paper a content-driven ETL processes which automates "as far as possible" the extraction phase based only on the content of flat Open Data sources. Our processes rely on data annotations and data mining techniques to discover hierarchical relationships. Processed data are then transformed into instance-schema graphs to facilitate the structural data integration and the definition of the multidimensional schemas of the data warehouse.
引用
收藏
页码:29 / 40
页数:12
相关论文
共 19 条
  • [1] Survey of graph database models
    Angles, Renzo
    Gutierrez, Claudio
    [J]. ACM COMPUTING SURVEYS, 2008, 40 (01)
  • [2] Balakrishnan S., 2010, Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD'10, P1187
  • [3] A semantic approach to ETL technologies
    Bergamaschi, Sonia
    Guerra, Francesco
    Orsini, Mirko
    Sartori, Claudio
    Vincini, Maurizio
    [J]. DATA & KNOWLEDGE ENGINEERING, 2011, 70 (08) : 717 - 731
  • [4] Birkhoff G., 1967, Amer. Math. Soc. Colloq. Publ., V25
  • [5] Bohm C., 2012, International World Wide Web Conference, P321, DOI DOI 10.1145/2187980.2188039
  • [6] Coletta Remi., 2012, P 1 INT WORKSHOP OPE, DOI [10.1145/2422604.2422606, DOI 10.1145/2422604.2422606]
  • [7] Ghozzi F., 2003, 5 INT C ENT INF SYST, P104
  • [8] Hierarchies in a multidimensional model:: From conceptual modeling to logical representation
    Malinowski, E.
    Zimanyi, E.
    [J]. DATA & KNOWLEDGE ENGINEERING, 2006, 59 (02) : 348 - 377
  • [9] Empowering the OLAP technology to support complex dimension hierarchies
    Databases and Information Systems Group, University of Konstanz, Konstanz, Germany
    不详
    [J]. Int. J. Data Warehouse. Min., 2007, 4 (31-50): : 31 - 50
  • [10] Mazon J.N., 2012, P 2012 JOINT EDBT IC, P144, DOI [10.1145/2320765.2320812, DOI 10.1145/2320765.2320812]