Leveraging Large Language Models for Sensor Data Retrieval

被引:1
作者
Berenguer, Alberto [1 ]
Morejon, Adriana [1 ]
Tomas, David [1 ]
Mazon, Jose-Norberto [1 ]
机构
[1] Univ Alicante, Dept Software & Comp Syst, Carretera San Vicente del Raspeig S-N, Alicante 03690, Spain
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 06期
关键词
sensor data; large language models; word embeddings; data retrieval; FAIR principles;
D O I
10.3390/app14062506
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The growing significance of sensor data in the development of information technology services finds obstacles due to disparate data presentations and non-adherence to FAIR principles. This paper introduces a novel approach for sensor data gathering and retrieval. The proposal leverages large language models to convert sensor data into FAIR-compliant formats and to provide word embedding representations of tabular data for subsequent exploration, enabling semantic comparison. The proposed system comprises two primary components. The first focuses on gathering data from sensors and converting it into a reusable structured format, while the second component aims to identify the most relevant sensor data to augment a given user-provided dataset. The evaluation of the proposed approach involved comparing the performance of various large language models in generating representative word embeddings for each table to retrieve related sensor data. The results show promising performance in terms of precision and MRR (0.90 and 0.94 for the best-performing model, respectively), indicating the system's ability to retrieve pertinent sensor data that fulfil user requirements.
引用
收藏
页数:18
相关论文
共 62 条
  • [1] ConvTab: A Context-Preserving, Convolutional Model for Ad-Hoc Table Retrieval
    Agarwal, Vibhav
    Bhardwaj, Akansha
    Rosso, Paolo
    Cudre-Mauroux, Philippe
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 5043 - 5052
  • [2] Dealing with Nonuniformity in Data Centric Storage for Wireless Sensor Networks
    Albano, Michele
    Chessa, Stefano
    Nidito, Francesco
    Pelagatti, Susanna
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2011, 22 (08) : 1398 - 1406
  • [3] [Anonymous], 2014, P NIPS MOD MACH LEAR
  • [4] Using Large Language Models to Enhance the Reusability of Sensor Data
    Berenguer, Alberto
    Morejon, Adriana
    Tomas, David
    Mazon, Jose-Norberto
    [J]. SENSORS, 2024, 24 (02)
  • [5] Bodenbenner M., 2021, Meas.: Sens, V18, P100206, DOI [10.1016/j.measen.2021.100206, DOI 10.1016/J.MEASEN.2021.100206]
  • [6] Bojanowski Piotr, 2017, T ASSOC COMPUT LING, V5, P135, DOI [10.1162/tacla00051, DOI 10.1162/TACL_A_00051, 10.1162/tacl_a_00051, DOI 10.1162/TACLA00051]
  • [7] Brown TB, 2020, ADV NEUR IN, V33
  • [8] A literature review on question answering techniques, paradigms and systems
    Calijorne Soares, Marco Antonio
    Parreiras, Fernando Silva
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2020, 32 (06) : 635 - 646
  • [9] Cappuzzo R., 2021, P 29 IT S ADV DAT SY, VVolume 2994, P331
  • [10] Carlini N, 2021, PROCEEDINGS OF THE 30TH USENIX SECURITY SYMPOSIUM, P2633