共 23 条
Named entity recognition for construction documents based on fine-tuning of large language models with low-quality datasets
被引:0
|作者:
Zhou, Junyu
[1
]
Ma, Zhiliang
[1
]
机构:
[1] Tsinghua Univ, Dept Civil Engn, Beijing, Peoples R China
基金:
中国国家自然科学基金;
关键词:
Construction documents;
Large language model;
Named entity recognition;
Low-quality datasets;
D O I:
10.1016/j.autcon.2025.106151
中图分类号:
TU [建筑科学];
学科分类号:
0813 ;
摘要:
Named Entity Recognition (NER) is a fundamental task for automatically processing and reusing documents. In traditional methods, machine learning has been used relying on costly high-quality datasets. This paper proposed an NER method based on fine-tuning Large Language Models (LLMs) with low-quality datasets for construction documents. Firstly, low-quality datasets were semi-automatically generated from national standards, qualification textbooks, and lexicons, including datasets of generation-type, tagging-type and question-answering type. Then, they were used to fine-tune an LLM for NER of structural elements to obtain optimal parametric fine-tuning conditions. Next, the results of optimally fine-tuned LLM were used to iterate the low-quality dataset to improve the performance. The F1 finally reached 0.756. Similar results were obtained on two other types of named entities, illustrating the generalizability. This paper provided a more effective and efficient method for the construction documents reuse. Future research should explore how to achieve better results by using other methods.
引用
收藏
页数:15
相关论文