A Study Case of Automatic Archival Research and Compilation using Large Language Models

被引：0

作者：

Guo, Dongsheng ^{[1
]}

Yue, Aizhen ^{[1
]}

Ning, Fanggang ^{[2
]}

Huang, Dengrong ^{[1
]}

Chang, Bingxin ^{[1
]}

Duan, Qiang ^{[1
]}

Zhang, Lianchao ^{[2
]}

Chen, Zhaoliang ^{[2
]}

Zhang, Zheng ^{[1
]}

Zhan, Enhao ^{[1
]}

Zhang, Qilai ^{[1
]}

Jiang, Kai ^{[1
]}

Li, Rui ^{[1
]}

Zhao, Shaoxiang ^{[2
]}

Wei, Zizhong ^{[1
]}

机构：

[1] Inspur Acad Sci & Technol, Jinan, Shandong, Peoples R China

[2] Inspur Software Co Ltd, Jinan, Shandong, Peoples R China

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH, ICKG | 2023年

关键词：

Archival research and compilation; Automatic method; Large language models; Fine-tuning;

D O I：

10.1109/ICKG59574.2023.00012

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Archival research and compilation is a specialized task that focuses on exploration, selection and processing of vast quantities of archival documents pertaining to specific subjects. Traditionally, this task has been characterized by its labor-intensive and time-consuming requirements. In recent years, the advancement of artificial intelligence has made automatic archival research and compilation tasks feasible. However, the limited availability of relevant samples imposes significant constraints on the application of deep learning models, given their high demand for sufficient data and knowledge. In this paper, we present a study case and propose an innovative method for automatic archival research and compilation, leveraging the robust knowledge base and text generation ability offered by large language models. Specifically, our method comprises three essential components: document retrieval, document summarization, and rule-based compilation. In the document summarization component, we leverage fine-tuned large language models to enhance the performance by simulation data generation and summary generation. Experimental results substantiate the effectiveness of our method. Furthermore, our method provides a general idea in using large language models, as well as a solution for addressing similar challenges in different domains.

引用

页码：52 / 59

页数：8

共 50 条

[1] Fostering websites accessibility: A case study on the use of the Large Language Models ChatGPT for automatic remediation
Othman, Achraf
Dhouib, Amira
Al Jabor, Aljazi Nasser
PROCEEDINGS OF THE 16TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS, PETRA 2023, 2023, : 707 - 713
[2] Automatic Unit Test Code Generation Using Large Language Models
Ocal, Akdeniz Kutay
Keskinoz, Mehmet
32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
[3] Perils and opportunities in using large language models in psychological research
Abdurahman, Suhaib
Atari, Mohammad
Karimi-Malekabadi, Farzan
Xue, Mona J.
Trager, Jackson
Park, Peter S.
Golazizian, Preni
Omrani, Ali
Dehghani, Morteza
PNAS NEXUS, 2024, 3 (07):
[4] SkillsInterpreter: A Case Study of Automatic Annotation of Flowcharts to Support Browsing Instructional Videos in Modern Martial Arts using Large Language Models
Oomori, Kotaro
Ishiguro, Yoshio
Rekimoto, Jun
AUGMENTED HUMANS 2024, AHS 2024, 2024, : 217 - 225
[5] Safety analysis in the era of large language models: A case study of STPA using ChatGPT
Qi, Yi
Zhao, Xingyu
Khastgir, Siddartha
Huang, Xiaowei
MACHINE LEARNING WITH APPLICATIONS, 2025, 19
[6] Automatic detection of contextual laterality in Mammography Reports using Large Language Models
Godoy, Eduardo
de Ferrari, Joaquin
Mellado, Diego
Chabert, Steren
Salas, Rodrigo
2024 14TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION SYSTEMS, ICPRS, 2024,
[7] Automatic instantiation of assurance cases from patterns using large language models
Odu, Oluwafemi
Belle, Alvine B.
Wang, Song
Kpodjedo, Segla
Lethbridge, Timothy C.
Hemmati, Hadi
JOURNAL OF SYSTEMS AND SOFTWARE, 2025, 222
[8] AUTOMATIC ASSESSMENT OF THE SCALE OF PRODROMAL SYMPTOMS (SOPS) USING LARGE LANGUAGE MODELS
Agurto, Carla
Castro, Eduardo
Reinen, Jenna
Mohandass, Dheshan
Srivastava, Agrima
Penzel, Nora
Polosecki, Pablo
Bilgrami, Zarina
Liebenthal, Einat
Woods, Scott
Shenton, Martha
Kahn, Rene
McGorry, Patrick
Kane, John
Bearden, Carrie E.
Pasternak, Ofer
Cecchi, Guillermo
Wolff, Phillip
Mizrahi, Romina
Nelson, Barnaby
Corcoran, Cheryl
NEUROPSYCHOPHARMACOLOGY, 2024, 49 : 527 - 528
[9] Toward Reproducing Network Research Results Using Large Language Models
Xiang, Qiao
Lin, Yuling
Fang, Mingjun
Huang, Bang
Huang, Siyong
Wen, Ridi
Le, Franck
Kong, Linghe
Shu, Jiwu
PROCEEDINGS OF THE 22ND ACM WORKSHOP ON HOT TOPICS IN NETWORKS, HOTNETS 2023, 2023, : 56 - 62
[10] Autoregressive Self-Evaluation: A Case Study of Music Generation Using Large Language Models
Banat, Rerker
Colton, Simon
2023 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI, 2023, : 264 - 265

← 1 2 3 4 5 →