Using Large Language Models to Enhance the Reusability of Sensor Data

被引:3
作者
Berenguer, Alberto [1 ]
Morejon, Adriana [1 ]
Tomas, David [1 ]
Mazon, Jose-Norberto [1 ]
机构
[1] Univ Alicante, Dept Software & Comp Syst, Carretera San Vicente Del Raspeig S-N, San Vicente Del Raspeig 03690, Spain
关键词
Internet of Things; sensor data; interoperability; data reusability; data processing; INTERNET;
D O I
10.3390/s24020347
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
The Internet of Things generates vast data volumes via diverse sensors, yet its potential remains unexploited for innovative data-driven products and services. Limitations arise from sensor-dependent data handling by manufacturers and user companies, hindering third-party access and comprehension. Initiatives like the European Data Act aim to enable high-quality access to sensor-generated data by regulating accuracy, completeness, and relevance while respecting intellectual property rights. Despite data availability, interoperability challenges impede sensor data reusability. For instance, sensor data shared in HTML formats requires an intricate, time-consuming processing to attain reusable formats like JSON or XML. This study introduces a methodology aimed at converting raw sensor data extracted from web portals into structured formats, thereby enhancing data reusability. The approach utilises large language models to derive structured formats from sensor data initially presented in non-interoperable formats. The effectiveness of these language models was assessed through quantitative and qualitative evaluations in a use case involving meteorological data. In the proposed experiments, GPT-4, the best performing LLM tested, demonstrated the feasibility of this methodology, achieving a precision of 93.51% and a recall of 85.33% in converting HTML to JSON/XML, thus confirming its potential in obtaining reusable sensor data.
引用
收藏
页数:20
相关论文
共 32 条
  • [1] Azerbayev Z, 2024, Arxiv, DOI arXiv:2310.10631
  • [2] Bodenbenner M., 2021, Meas.: Sens, V18, P100206, DOI [10.1016/j.measen.2021.100206, DOI 10.1016/J.MEASEN.2021.100206]
  • [3] Brown TB, 2020, ADV NEUR IN, V33
  • [4] A literature review on question answering techniques, paradigms and systems
    Calijorne Soares, Marco Antonio
    Parreiras, Fernando Silva
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2020, 32 (06) : 635 - 646
  • [5] LLM Multimodal Traffic Accident Forecasting
    de Zarza, I.
    de Curto, J.
    Roig, Gemma
    Calafate, Carlos T.
    [J]. SENSORS, 2023, 23 (22)
  • [6] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [7] A Large-Scale Analysis of IoT Firmware Version Distribution in the Wild
    Ebbers, Frank
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (02) : 816 - 830
  • [8] Fan D., 2018, P 2018 INT C NETWORK, P442
  • [9] ShoVAT: Shodan-based vulnerability assessment tool for Internet-facing services
    Genge, Bela
    Enachescu, Calin
    [J]. SECURITY AND COMMUNICATION NETWORKS, 2016, 9 (15) : 2696 - 2714
  • [10] FactDAG: Formalizing Data Interoperability in an Internet of Production
    Gleim, Lars
    Pennekamp, Jan
    Liebenberg, Martin
    Buchsbaum, Melanie
    Niemietz, Philipp
    Knape, Simon
    Epple, Alexander
    Storms, Simon
    Trauth, Daniel
    Bergs, Thomas
    Brecher, Christian
    Decker, Stefan
    Lakemeyer, Gerhard
    Wehrle, Klaus
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2020, 7 (04) : 3243 - 3253