Efficient IoT Data Management for Geological Disasters Based on Big Data-Turbocharged Data Lake Architecture

被引:4
作者
Huang, Xiaohui [1 ]
Fan, Junqing [1 ,2 ]
Deng, Ze [1 ,2 ]
Yan, Jining [1 ,2 ]
Li, Jiabao [1 ]
Wang, Lizhe [1 ,2 ]
机构
[1] China Univ Geosci, Sch Comp Sci, Wuhan 430074, Peoples R China
[2] China Univ Geosci, Hubei Key Lab Intelligent Geo Informat Proc, Wuhan 430074, Peoples R China
基金
中国国家自然科学基金;
关键词
geohazards; IoT data; data management; data lake; distributed computing; INTERNET; SPARK;
D O I
10.3390/ijgi10110743
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-source Internet of Things (IoT) data, archived in institutions' repositories, are becoming more and more widely open-sourced to make them publicly accessed by scientists, developers, and decision makers via web services to promote researches on geohazards prevention. In this paper, we design and implement a big data-turbocharged system for effective IoT data management following the data lake architecture. We first propose a multi-threading parallel data ingestion method to ingest IoT data from institutions' data repositories in parallel. Next, we design storage strategies for both ingested IoT data and processed IoT data to store them in a scalable, reliable storage environment. We also build a distributed cache layer to enable fast access to IoT data. Then, we provide users with a unified, SQL-based interactive environment to enable IoT data exploration by leveraging the processing ability of Apache Spark. In addition, we design a standard-based metadata model to describe ingested IoT data and thus support IoT dataset discovery. Finally, we implement a prototype system and conduct experiments on real IoT data repositories to evaluate the efficiency of the proposed system.
引用
收藏
页数:18
相关论文
共 34 条
  • [1] Big data and disaster management: a systematic review and agenda for future research
    Akter, Shahriar
    Wamba, Samuel Fosso
    [J]. ANNALS OF OPERATIONS RESEARCH, 2019, 283 (1-2) : 939 - 959
  • [2] Spark SQL: Relational Data Processing in Spark
    Armbrust, Michael
    Xin, Reynold S.
    Lian, Cheng
    Huai, Yin
    Liu, Davies
    Bradley, Joseph K.
    Meng, Xiangrui
    Kaftan, Tomer
    Franklint, Michael J.
    Ghodsi, Ali
    Zaharia, Matei
    [J]. SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, : 1383 - 1394
  • [3] A Survey on IoT Big Data: Current Status, 13 V's Challenges, and Future Directions
    Bansal, Maggi
    Chana, Inderveer
    Clarke, Siobhan
    [J]. ACM COMPUTING SURVEYS, 2021, 53 (06)
  • [4] CoreDB: a Data Lake Service
    Beheshti, Amin
    Benatallah, Boualem
    Nouri, Reza
    Van Munin Chhieng
    Xiong, HuangTao
    Zhao, Xu
    [J]. CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 2451 - 2454
  • [5] Geographic Information Metadata-An Outlook from the International Standardization Perspective
    Brodeur, Jean
    Coetzee, Serena
    Danko, David
    Garcia, Stephane
    Hjelmager, Jan
    [J]. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2019, 8 (06)
  • [6] Cheng Yanzhe., 2018, Proceedings of the Practice and Experience on Advanced Research Computing, P1, DOI [DOI 10.1145/3219104.3229288, 10.1145/3219104.3229288]
  • [7] Big Data Lakes: Models, Frameworks, and Techniques
    Cuzzocrea, Alfredo
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2021), 2021, : 1 - 4
  • [8] Foumelis M, 2021, ACTA GEOPHYS, V69, P1025, DOI 10.1007/s11600-021-00578-6
  • [9] Synchronization of chaotic artificial neurons and its application to secure image transmission under MQTT for IoT protocol
    Gonzalez-Zapata, Astrid Maritza
    Tlelo-Cuautle, Esteban
    Cruz-Vega, Israel
    Leon-Salas, Walter Daniel
    [J]. NONLINEAR DYNAMICS, 2021, 104 (04) : 4581 - 4600
  • [10] Constance: An Intelligent Data Lake System
    Hai, Rihan
    Geisler, Sandra
    Quix, Christoph
    [J]. SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 2097 - 2100