A Study Case of Automatic Archival Research and Compilation using Large Language Models

被引：0

作者：

Guo, Dongsheng ^{[1
]}

Yue, Aizhen ^{[1
]}

Ning, Fanggang ^{[2
]}

Huang, Dengrong ^{[1
]}

Chang, Bingxin ^{[1
]}

Duan, Qiang ^{[1
]}

Zhang, Lianchao ^{[2
]}

Chen, Zhaoliang ^{[2
]}

Zhang, Zheng ^{[1
]}

Zhan, Enhao ^{[1
]}

Zhang, Qilai ^{[1
]}

Jiang, Kai ^{[1
]}

Li, Rui ^{[1
]}

Zhao, Shaoxiang ^{[2
]}

Wei, Zizhong ^{[1
]}

机构：

[1] Inspur Acad Sci & Technol, Jinan, Shandong, Peoples R China

[2] Inspur Software Co Ltd, Jinan, Shandong, Peoples R China

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH, ICKG | 2023年

关键词：

Archival research and compilation; Automatic method; Large language models; Fine-tuning;

D O I：

10.1109/ICKG59574.2023.00012

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Archival research and compilation is a specialized task that focuses on exploration, selection and processing of vast quantities of archival documents pertaining to specific subjects. Traditionally, this task has been characterized by its labor-intensive and time-consuming requirements. In recent years, the advancement of artificial intelligence has made automatic archival research and compilation tasks feasible. However, the limited availability of relevant samples imposes significant constraints on the application of deep learning models, given their high demand for sufficient data and knowledge. In this paper, we present a study case and propose an innovative method for automatic archival research and compilation, leveraging the robust knowledge base and text generation ability offered by large language models. Specifically, our method comprises three essential components: document retrieval, document summarization, and rule-based compilation. In the document summarization component, we leverage fine-tuned large language models to enhance the performance by simulation data generation and summary generation. Experimental results substantiate the effectiveness of our method. Furthermore, our method provides a general idea in using large language models, as well as a solution for addressing similar challenges in different domains.

引用

页码：52 / 59

页数：8

共 50 条

[31] An empirical study on the effectiveness of large language models for SATD identification and classification
Sheikhaei, Mohammad Sadegh
Tian, Yuan
Wang, Shaowei
Xu, Bowen
EMPIRICAL SOFTWARE ENGINEERING, 2024, 29 (06)
[32] Leveraging Large Language Models for Enhanced Classification and Analysis: Fire Incidents Case Study
Alkhammash, Eman H.
FIRE-SWITZERLAND, 2025, 8 (01):
[33] Making Large Language Models More Reliable and Beneficial: Taking ChatGPT as a Case Study
Majeed, Abdul
Hwang, Seong Oun
COMPUTER, 2024, 57 (03) : 101 - 106
[34] A case study of fairness in generated images of Large Language Models for Software Engineering tasks
Sami, Mansour
Sami, Ashkan
Barclay, Pete
2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION, ICSME, 2023, : 391 - 396
[35] Large Language Models and Sentiment Analysis in Financial Markets: A Review, Datasets, and Case Study
Liu, Chenghao
Arulappan, Arunkumar
Naha, Ranesh
Mahanti, Aniket
Kamruzzaman, Joarder
Ra, In-Ho
IEEE ACCESS, 2024, 12 : 134041 - 134061
[36] Do Large Language Models Show Decision Heuristics Similar to Humans? A Case Study Using GPT-3.5
Suri, Gaurav
Slater, Lily R.
Ziaee, Ali
Nguyen, Morgan
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-GENERAL, 2024, 153 (04) : 1066 - 1075
[37] Unleashing the power of large language models specific for haemophilia research
Castaldoni, Rodrigo
Ferreira-Martins, Andre Juan
Nogueira, Tatiane
Rios, Ricardo
Lopes, Tiago Jose da Silva
HAEMOPHILIA, 2024, 30 : 5 - 5
[38] How to Use Large Language Models for Empirical Legal Research
Choi, Jonathan H.
JOURNAL OF INSTITUTIONAL AND THEORETICAL ECONOMICS-ZEITSCHRIFT FUR DIE GESAMTE STAATSWISSENSCHAFT, 2024, 180 (02): : 214 - 233
[39] Why Large Language Models will (not) Kill Software Engineering Research
Di Penta, Massimiliano
PROCEEDINGS OF 2024 28TH INTERNATION CONFERENCE ON EVALUATION AND ASSESSMENT IN SOFTWARE ENGINEERING, EASE 2024, 2024, : 5 - 5
[40] Using Large Language Models to Improve Sentiment Analysis in Latvian Language
Purvins, Pauls
Urtans, Evalds
Caune, Vairis
BALTIC JOURNAL OF MODERN COMPUTING, 2024, 12 (02): : 165 - 175

← 1 2 3 4 5 →