Leveraging Large Language Models for Sensor Data Retrieval

被引:1
作者
Berenguer, Alberto [1 ]
Morejon, Adriana [1 ]
Tomas, David [1 ]
Mazon, Jose-Norberto [1 ]
机构
[1] Univ Alicante, Dept Software & Comp Syst, Carretera San Vicente del Raspeig S-N, Alicante 03690, Spain
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 06期
关键词
sensor data; large language models; word embeddings; data retrieval; FAIR principles;
D O I
10.3390/app14062506
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The growing significance of sensor data in the development of information technology services finds obstacles due to disparate data presentations and non-adherence to FAIR principles. This paper introduces a novel approach for sensor data gathering and retrieval. The proposal leverages large language models to convert sensor data into FAIR-compliant formats and to provide word embedding representations of tabular data for subsequent exploration, enabling semantic comparison. The proposed system comprises two primary components. The first focuses on gathering data from sensors and converting it into a reusable structured format, while the second component aims to identify the most relevant sensor data to augment a given user-provided dataset. The evaluation of the proposed approach involved comparing the performance of various large language models in generating representative word embeddings for each table to retrieve related sensor data. The results show promising performance in terms of precision and MRR (0.90 and 0.94 for the best-performing model, respectively), indicating the system's ability to retrieve pertinent sensor data that fulfil user requirements.
引用
收藏
页数:18
相关论文
共 62 条
  • [11] Table Search Using a Deep Contextualized Language Model
    Chen, Zhiyu
    Trabelsi, Mohamed
    Heflin, Jeff
    Xu, Yinan
    Davison, Brian D.
    [J]. PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 589 - 598
  • [12] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [13] A Reference Architecture and Model for Sensor Data Warehousing
    Dobson, Simon
    Golfarelli, Matteo
    Graziani, Simone
    Rizzi, Stefano
    [J]. IEEE SENSORS JOURNAL, 2018, 18 (18) : 7659 - 7670
  • [14] TabularNet: A Neural Network Architecture for Understanding Semantic Structures of Tabular Data
    Du, Lun
    Gao, Fei
    Chen, Xu
    Jia, Ran
    Wang, Junshan
    Zhang, Jiang
    Han, Shi
    Zhang, Dongmei
    [J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 322 - 331
  • [15] Fang Q., 2006, P IEEE INFOCOM 25 IE, P1, DOI [10.1109/infocom.2006.115, DOI 10.1109/INFOCOM.2006.115]
  • [16] Gunther M., 2021, AIDM 21 4 WORKSHOP E, P24, DOI DOI 10.1145/3464509.3464892
  • [17] Gur I., 2023, FINDINGS ASS COMPUTA, P2803, DOI 10.18653/v1/2023.findings-emnlp.185
  • [18] Herzig J, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P4320
  • [19] Howard J, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P328
  • [20] Huang H, 2024, Arxiv, DOI [arXiv:2310.07676, DOI 10.18653/V1/2024.FINDINGS-NAACL.94]