An Automated Metadata Generation Method for Data Lake of Industrial WoT Applications

被引:5
|
作者
Yu, Han [1 ]
Cai, Hongming [1 ]
Liu, Zhiyuan [1 ]
Xu, Boyi [2 ]
Jiang, Lihong [1 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Software, Shanghai 200240, Peoples R China
[2] Shanghai Jiao Tong Univ, Coll Econ & Management, Shanghai 200052, Peoples R China
来源
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS | 2022年 / 52卷 / 08期
基金
中国国家自然科学基金;
关键词
Metadata; Semantics; Runtime; Data mining; Ontologies; Text recognition; Conferences; Data lake (DL); data modeling; entity recognition; metadata generation; stream processing; Web of Things (WoT); ACQUISITION; EXTRACTION;
D O I
10.1109/TSMC.2021.3119871
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent trends in the Web of Things (WoT) have led to data explosion. Data lake (DL), as a flexible on-demand heterogeneous data management architecture, has become a feasible solution in data management. Metadata modeling for DLs is the key basis for smart analysis and processing. However, the varieties in structures and semantics of industrial WoT data hinder metadata modeling and maintenance. Moreover, the lack of textual descriptions and the semantics hidden in value streams make it hard to automatically construct semantic metadata. The dynamic nature of WoT requires on-time evolution on metadata. To overcome these challenges, we propose an automated bottom-up metadata generation approach for DL of WoT applications. Applying a data-driven framework, raw data are notated as linked data and self-organizing map-based online clustering is applied to real timely extract data characteristics. To recognize entities, concepts and relations, semantics-based entity discovery approach from short texts is proposed according to the feature of WoT data. The numerical analysis is performed to find the hidden relations from raw values. Full-dimensional metadata with rich semantic knowledge are finally built. Experiments on a real-world dataset are conducted to verify the effectiveness of methods and a case study on an energy WoT system is provided to demonstrate the feasibility of the approach.
引用
收藏
页码:5235 / 5248
页数:14
相关论文
共 50 条
  • [21] A method for organizing metadata of storage nodes with data de-duplication
    Wang, Guohua
    Zhao, Yuelong
    Li, Tianxiang
    Liao, Jinggui
    Journal of Computational Information Systems, 2014, 10 (09): : 3845 - 3854
  • [22] Novel Conditional Metadata Embedding Data Preprocessing Method for Semantic Segmentation
    Wang, Juntuo
    Zhao, Qiaochu
    Lin, Dongheng
    Purwantot, Erick
    Man, Ka Lok
    2022 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY, CYBERC, 2022, : 303 - 311
  • [23] Automated metadata, provenance cataloging and navigable interfaces: Ensuring the usefulness of extreme-scale data
    Schissel, D. P.
    Abla, G.
    Flanagan, S. M.
    Greenwald, M.
    Lee, X.
    Romosan, A.
    Shoshani, A.
    Stillerman, J.
    Wright, J.
    FUSION ENGINEERING AND DESIGN, 2014, 89 (05) : 745 - 749
  • [24] Query generation for retrieving data from distributed semistructured documents using a metadata interface
    Choe, Guija
    Nam, Young-Kwang
    Goguen, Joseph
    Wang, Guilian
    COMPUTER LANGUAGES SYSTEMS & STRUCTURES, 2009, 35 (04) : 422 - 434
  • [25] CWM based unified metadata management for data warehouse applying MDA method
    Zhang, L
    Liang, JA
    Ai, B
    ICCC2004: Proceedings of the 16th International Conference on Computer Communication Vol 1and 2, 2004, : 1687 - 1691
  • [26] Metadata-based Store Procedure Design and Implement Method of Data Update
    Zhao, Weidong
    Lv, Xiaoni
    Lu, Xinming
    Li, Yong
    2011 AASRI CONFERENCE ON APPLIED INFORMATION TECHNOLOGY (AASRI-AIT 2011), VOL 2, 2011, : 278 - 281
  • [27] Applications of Generalized Difference Method for Hypothesis Generation to Social Big Data in Concept and Real Spaces
    Ishikawa, Hiroshi
    Kato, Daiju
    Endo, Masaki
    Hirota, Masaharu
    11TH INTERNATIONAL CONFERENCE ON MANAGEMENT OF DIGITAL ECOSYSTEMS (MEDES), 2019, : 44 - 55
  • [28] Scene-based Metadata Generation and Open API Provisioning Method for Smart Broadcast Service
    Kim, Seung-Hee
    Jung, Deokkyu
    Lee, Sang-Yun
    Kim, Sun-Joong
    2018 20TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT), 2018, : 664 - 668
  • [29] An automated approach towards generation of stream attributes for use in GIS applications
    Pradhan, Ashis
    Pradhan, Mohan P.
    Pradhan, Ratika
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (07) : 20307 - 20356
  • [30] Automated breast imaging report generation based on the integration of multiple image features in a metadata format for shared decision-making
    Lo, Chung-Ming
    Chen, Hui-Ru
    HEALTH INFORMATICS JOURNAL, 2024, 30 (03)