An Automated Metadata Generation Method for Data Lake of Industrial WoT Applications

被引:5
|
作者
Yu, Han [1 ]
Cai, Hongming [1 ]
Liu, Zhiyuan [1 ]
Xu, Boyi [2 ]
Jiang, Lihong [1 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Software, Shanghai 200240, Peoples R China
[2] Shanghai Jiao Tong Univ, Coll Econ & Management, Shanghai 200052, Peoples R China
来源
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS | 2022年 / 52卷 / 08期
基金
中国国家自然科学基金;
关键词
Metadata; Semantics; Runtime; Data mining; Ontologies; Text recognition; Conferences; Data lake (DL); data modeling; entity recognition; metadata generation; stream processing; Web of Things (WoT); ACQUISITION; EXTRACTION;
D O I
10.1109/TSMC.2021.3119871
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent trends in the Web of Things (WoT) have led to data explosion. Data lake (DL), as a flexible on-demand heterogeneous data management architecture, has become a feasible solution in data management. Metadata modeling for DLs is the key basis for smart analysis and processing. However, the varieties in structures and semantics of industrial WoT data hinder metadata modeling and maintenance. Moreover, the lack of textual descriptions and the semantics hidden in value streams make it hard to automatically construct semantic metadata. The dynamic nature of WoT requires on-time evolution on metadata. To overcome these challenges, we propose an automated bottom-up metadata generation approach for DL of WoT applications. Applying a data-driven framework, raw data are notated as linked data and self-organizing map-based online clustering is applied to real timely extract data characteristics. To recognize entities, concepts and relations, semantics-based entity discovery approach from short texts is proposed according to the feature of WoT data. The numerical analysis is performed to find the hidden relations from raw values. Full-dimensional metadata with rich semantic knowledge are finally built. Experiments on a real-world dataset are conducted to verify the effectiveness of methods and a case study on an energy WoT system is provided to demonstrate the feasibility of the approach.
引用
收藏
页码:5235 / 5248
页数:14
相关论文
共 50 条
  • [31] A Proposed Data Preprocessing Method for an Industrial Prediction Process
    Battas, Ilham
    Oulhiq, Ridouane
    Behja, Hicham
    Deshayes, Laurent
    2020 6TH IEEE CONGRESS ON INFORMATION SCIENCE AND TECHNOLOGY (IEEE CIST'20), 2020, : 98 - 103
  • [32] Towards an automated method to assess data portals in the deep web
    Correa, Andreiwid Sheffer
    de Souza, Raul Mendes
    Correa da Silva, Flavio Soares
    GOVERNMENT INFORMATION QUARTERLY, 2019, 36 (03) : 412 - 426
  • [33] An Automated Method for Extracting and Analyzing Railway Infrastructure Cost Data
    Dopazo, Daniel Adanza
    Mahdjoubi, Lamine
    Gething, Bill
    BUILDINGS, 2023, 13 (10)
  • [34] Data flow in clinical laboratories: could metadata and peridata bridge the gap to new AI-based applications?
    Padoan, Andrea
    Cadamuro, Janne
    Frans, Glynis
    Cabitza, Federico
    Tolios, Alexander
    De Bruyne, Sander
    van Doorn, William
    Elias, Johannes
    Debeljak, Zeljko
    Perez, Salomon Martin
    Ozdemir, Habib
    Carobene, Anna
    CLINICAL CHEMISTRY AND LABORATORY MEDICINE, 2024,
  • [35] What Information Does Your EHR Contain? Automatic Generation of a Clinical Metadata Warehouse (CMDW) to Support Identification and Data Access Within Distributed Clinical Research Networks
    Bruland, Philipp
    Doods, Justin
    Storck, Michael
    Dugas, Martin
    MEDINFO 2017: PRECISION HEALTHCARE THROUGH INFORMATICS, 2017, 245 : 313 - 317
  • [36] An automated method for glacial lake mapping in High Mountain Asia using Landsat 8 imagery
    Zhang Mei-mei
    Chen Fang
    Tian Bang-sen
    JOURNAL OF MOUNTAIN SCIENCE, 2018, 15 (01) : 13 - 24
  • [37] PRELIMINARY STEPS TO DATA MODELING EXEMPLIFIED BY 2 PROJECTS IN INDUSTRIAL COMPANIES - AUTOMATED DATA INVENTORY AND ELECTRONIC TERM CATALOG
    BACKHOCK, A
    BORKOWSKI, V
    BUTTNER, W
    KREITMAIR, B
    SCHEUER, KJ
    SPERBER, R
    WOHLMUTH, W
    WIRTSCHAFTSINFORMATIK, 1994, 36 (05): : 409 - 421
  • [38] Applications of data mining to sub-plan selection in automated planning systems
    Gunderson, JP
    Martin, WN
    2001 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: E-SYSTEMS AND E-MAN FOR CYBERNETICS IN CYBERSPACE, 2002, : 1459 - 1464
  • [39] Research on data model generation of bacs using bim and data analysis method
    KASUYA T.
    1600, Architectural Institute of Japan (26): : 1246 - 1251
  • [40] Automated As-Built Model Generation of Subway Tunnels from Mobile LiDAR Data
    Arastounia, Mostafa
    SENSORS, 2016, 16 (09):