Named entity recognition for construction documents based on fine-tuning of large language models with low-quality datasets

被引:0
|
作者
Zhou, Junyu [1 ]
Ma, Zhiliang [1 ]
机构
[1] Tsinghua Univ, Dept Civil Engn, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Construction documents; Large language model; Named entity recognition; Low-quality datasets;
D O I
10.1016/j.autcon.2025.106151
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Named Entity Recognition (NER) is a fundamental task for automatically processing and reusing documents. In traditional methods, machine learning has been used relying on costly high-quality datasets. This paper proposed an NER method based on fine-tuning Large Language Models (LLMs) with low-quality datasets for construction documents. Firstly, low-quality datasets were semi-automatically generated from national standards, qualification textbooks, and lexicons, including datasets of generation-type, tagging-type and question-answering type. Then, they were used to fine-tune an LLM for NER of structural elements to obtain optimal parametric fine-tuning conditions. Next, the results of optimally fine-tuned LLM were used to iterate the low-quality dataset to improve the performance. The F1 finally reached 0.756. Similar results were obtained on two other types of named entities, illustrating the generalizability. This paper provided a more effective and efficient method for the construction documents reuse. Future research should explore how to achieve better results by using other methods.
引用
收藏
页数:15
相关论文
共 23 条
  • [1] Sample Size Considerations for Fine-Tuning Large Language Models for Named Entity Recognition Tasks: Methodological Study
    Majdik, Zoltan P.
    Graham, S. Scott
    Edward, Jade C. Shiva
    Rodriguez, Sabrina N.
    Karnes, Martha S.
    Jensen, Jared T.
    Barbour, Joshua B.
    Rousseau, Justin F.
    JMIR AI, 2024, 3
  • [2] Fine-Tuning BERT Model for Materials Named Entity Recognition
    Zhao, Xintong
    Greenberg, Jane
    An, Yuan
    Hu, Xiaohua Tony
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 3717 - 3720
  • [3] Large Language Models for Latvian Named Entity Recognition
    Viksna, Rinalds
    Skadina, Inguna
    HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE (HLT 2020), 2020, 328 : 62 - 69
  • [4] Fine-tuning large language models for improved health communication in low-resource languages
    Bui, Nhat
    Nguyen, Giang
    Nguyen, Nguyen
    Vo, Bao
    Vo, Luan
    Huynh, Tom
    Tang, Arthur
    Tran, Van Nhiem
    Huynh, Tuyen
    Nguyen, Huy Quang
    Dinh, Minh
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2025, 263
  • [5] Combining prompt-based language models and weak supervision for labeling named entity recognition on legal documents
    Oliveira, Vitor
    Nogueira, Gabriel
    Faleiros, Thiago
    Marcacini, Ricardo
    ARTIFICIAL INTELLIGENCE AND LAW, 2024, 33 (2) : 361 - 381
  • [6] Enhancing Named Entity Recognition for Agricultural Commodity Monitoring with Large Language Models
    Chebbi, Abir
    Kniesel, Guido
    Abdennadher, Nabil
    Dimarzo, Giovanna
    PROCEEDINGS OF THE 2024 4TH WORKSHOP ON MACHINE LEARNING AND SYSTEMS, EUROMLSYS 2024, 2024, : 208 - 213
  • [7] Comparative Analysis of Large Language Models in Chinese Medical Named Entity Recognition
    Zhu, Zhichao
    Zhao, Qing
    Li, Jianjiang
    Ge, Yanhu
    Ding, Xingjian
    Gu, Tao
    Zou, Jingchen
    Lv, Sirui
    Wang, Sheng
    Yang, Ji-Jiang
    BIOENGINEERING-BASEL, 2024, 11 (10):
  • [8] Fine-tuning large language models for rare disease concept normalization
    Wang, Andy
    Liu, Cong
    Yang, Jingye
    Weng, Chunhua
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09) : 2076 - 2083
  • [9] Parameter-efficient fine-tuning in large language models: a survey of methodologies
    Wang, Luping
    Chen, Sheng
    Jiang, Linnan
    Pan, Shu
    Cai, Runze
    Yang, Sen
    Yang, Fei
    ARTIFICIAL INTELLIGENCE REVIEW, 2025, 58 (08)
  • [10] Cross-Domain Tibetan Named Entity Recognition via Large Language Models
    Zhang, Jin
    Gao, Fan
    Yeshi, Lobsang
    Tashi, Dorje
    Wang, Xiangshi
    Tashi, Nyima
    Luosang, Gadeng
    ELECTRONICS, 2025, 14 (01):