Named entity recognition for construction documents based on fine-tuning of large language models with low-quality datasets

被引：0

作者：

Zhou, Junyu ^{[1
]}

Ma, Zhiliang ^{[1
]}

机构：

[1] Tsinghua Univ, Dept Civil Engn, Beijing, Peoples R China

来源：

AUTOMATION IN CONSTRUCTION | 2025年 / 174卷

基金：

中国国家自然科学基金;

关键词：

Construction documents; Large language model; Named entity recognition; Low-quality datasets;

D O I：

10.1016/j.autcon.2025.106151

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

Named Entity Recognition (NER) is a fundamental task for automatically processing and reusing documents. In traditional methods, machine learning has been used relying on costly high-quality datasets. This paper proposed an NER method based on fine-tuning Large Language Models (LLMs) with low-quality datasets for construction documents. Firstly, low-quality datasets were semi-automatically generated from national standards, qualification textbooks, and lexicons, including datasets of generation-type, tagging-type and question-answering type. Then, they were used to fine-tune an LLM for NER of structural elements to obtain optimal parametric fine-tuning conditions. Next, the results of optimally fine-tuned LLM were used to iterate the low-quality dataset to improve the performance. The F1 finally reached 0.756. Similar results were obtained on two other types of named entities, illustrating the generalizability. This paper provided a more effective and efficient method for the construction documents reuse. Future research should explore how to achieve better results by using other methods.

引用

页数：15

共 23 条

[1] Sample Size Considerations for Fine-Tuning Large Language Models for Named Entity Recognition Tasks: Methodological Study
Majdik, Zoltan P.
Graham, S. Scott
Edward, Jade C. Shiva
Rodriguez, Sabrina N.
Karnes, Martha S.
Jensen, Jared T.
Barbour, Joshua B.
Rousseau, Justin F.
JMIR AI, 2024, 3
[2] Fine-Tuning BERT Model for Materials Named Entity Recognition
Zhao, Xintong
Greenberg, Jane
An, Yuan
Hu, Xiaohua Tony
2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 3717 - 3720
[3] Large Language Models for Latvian Named Entity Recognition
Viksna, Rinalds
Skadina, Inguna
HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE (HLT 2020), 2020, 328 : 62 - 69
[4] Fine-tuning large language models for improved health communication in low-resource languages
Bui, Nhat
Nguyen, Giang
Nguyen, Nguyen
Vo, Bao
Vo, Luan
Huynh, Tom
Tang, Arthur
Tran, Van Nhiem
Huynh, Tuyen
Nguyen, Huy Quang
Dinh, Minh
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2025, 263
[5] Combining prompt-based language models and weak supervision for labeling named entity recognition on legal documents
Oliveira, Vitor
Nogueira, Gabriel
Faleiros, Thiago
Marcacini, Ricardo
ARTIFICIAL INTELLIGENCE AND LAW, 2024, 33 (2) : 361 - 381
[6] Enhancing Named Entity Recognition for Agricultural Commodity Monitoring with Large Language Models
Chebbi, Abir
Kniesel, Guido
Abdennadher, Nabil
Dimarzo, Giovanna
PROCEEDINGS OF THE 2024 4TH WORKSHOP ON MACHINE LEARNING AND SYSTEMS, EUROMLSYS 2024, 2024, : 208 - 213
[7] Comparative Analysis of Large Language Models in Chinese Medical Named Entity Recognition
Zhu, Zhichao
Zhao, Qing
Li, Jianjiang
Ge, Yanhu
Ding, Xingjian
Gu, Tao
Zou, Jingchen
Lv, Sirui
Wang, Sheng
Yang, Ji-Jiang
BIOENGINEERING-BASEL, 2024, 11 (10):
[8] Fine-tuning large language models for rare disease concept normalization
Wang, Andy
Liu, Cong
Yang, Jingye
Weng, Chunhua
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09) : 2076 - 2083
[9] Parameter-efficient fine-tuning in large language models: a survey of methodologies
Wang, Luping
Chen, Sheng
Jiang, Linnan
Pan, Shu
Cai, Runze
Yang, Sen
Yang, Fei
ARTIFICIAL INTELLIGENCE REVIEW, 2025, 58 (08)
[10] Cross-Domain Tibetan Named Entity Recognition via Large Language Models
Zhang, Jin
Gao, Fan
Yeshi, Lobsang
Tashi, Dorje
Wang, Xiangshi
Tashi, Nyima
Luosang, Gadeng
ELECTRONICS, 2025, 14 (01):

← 1 2 3 →