Methodology of Big Data Integration from A Priori Unknown Heterogeneous Data Sources

被引:0
作者
Samoylov, Alexey [1 ]
Sergeev, Nikolay [1 ]
Kucherova, Margarita [1 ]
Denisov, Boris [2 ]
机构
[1] Southern Fed Univ, Chekhova St 2, Taganrog, Russia
[2] Green Oasis Sch Tianmian, 4030 Shennan Middle Rd, Shenzhen 518026, Guangdong, Peoples R China
来源
PROCEEDINGS OF 2018 THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (CSAI 2018) / 2018 THE 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND MULTIMEDIA TECHNOLOGY (ICIMT 2018) | 2018年
关键词
Big Data; data integration; knowledge extraction; ETL; heterogeneous data sources; modeling; semantics; ETL; FRAMEWORK;
D O I
10.1145/3297156.3297249
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The success of data preparation for Big Data analytics directly depends on the quality of data integration from heterogeneous data sources. Extract, Transform and Load (ETL) systems have proved to be an efficient solution for this task. But to the moment, in the stages of data selection, definition of extraction rules and transformation, the decision is usually made exclusively by a data specialist. This, in turn, causes such problems as redundancy and inconsistency of imported data, narrow specialization of rules (up to uniqueness) with a limited number of analytical models and known requirements for the data mart. This paper presents the concept of solving the problem by providing methodological support for Big Data preparation procedure to efficiently collect data from a priory unknown heterogeneous data sources.
引用
收藏
页码:131 / 135
页数:5
相关论文
共 24 条
  • [1] [Anonymous], 2008, DW 2 0 ARCHITECTURE
  • [2] Integrating Big Data: A Semantic Extract-Transform-Load Framework
    Bansal, Srividya K.
    Kagemann, Sebastian
    [J]. COMPUTER, 2015, 48 (03) : 42 - 50
  • [3] Towards a Semantic Extract-Transform-Load (ETL) framework for Big Data Integration
    Bansal, Srividya K.
    [J]. 2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, : 521 - 528
  • [4] El Akkaoui Zineb, 2012, Data Warehousing and Knowledge Discovery. Proceedings of the 14th International Conference, DaWaK 2012, P1, DOI 10.1007/978-3-642-32584-7_1
  • [5] Elleuch N, 2007, ISCIII '07: 3RD INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, PROCEEDINGS, P219
  • [6] He JF, 2005, LECT NOTES COMPUT SC, V3722, P70
  • [7] Kabiri A, 2011, COMM COM INF SC, V241, P146
  • [8] Feature-oriented product line engineering
    Kang, KC
    Lee, J
    Donohoe, P
    [J]. IEEE SOFTWARE, 2002, 19 (04) : 58 - +
  • [9] Avoiding Ontology Confusion in ETL Processes
    Khouri, Selma
    Abdellaoui, Sabrina
    Nader, Fahima
    [J]. NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS (ADBIS 2015), 2015, 539 : 119 - 126
  • [10] Kimball R., 2004, DATA WAREHOUSE ETL T