Reliability of large language models as a tool for knowledge extraction from biographical dictionaries: the case of the Polish Biographical Dictionary

被引:0
作者
Jaskulski, Piotr [1 ]
Latos, Tomasz [1 ]
Rynca, Mariusz [1 ]
Zapala, Adam [1 ]
机构
[1] Polish Acad Sci, Tadeusz Manteuffel Inst Hist, Rynek Starego Miasta 31, PL-00272 Warsaw, Poland
关键词
information extraction; biographies; large language models;
D O I
10.1093/llc/fqaf014
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
Large language models are tools with great potential for text processing. This study aims to assess the reliability of the models' results in extracting structured knowledge from unstructured textual sources, particularly biographies from the Polish Biographical Dictionary. The task of the model was to extract information about the individuals, such as date and place of birth, death and burial, family relationships, important people, related settlements and institutions as well as occupied positions. The test was conducted on a sample of 250 biographies. The texts were written in Polish from the 1930s onwards and described the lives of individuals from various historical periods. The results show that the large language model (LLM) is very effective in identifying basic personal data, important family relationships, occupations, or offices held by the characters. Weaker results were obtained when attempting to find institutions and places associated with the protagonists. The outcome of the test suggests that LLMs can efficiently assist in digitizing and structuring historical biographical data and offer a promising tool for improving historical knowledge bases and speeding up the work compared to manual extraction of information.
引用
收藏
页数:12
相关论文
共 20 条
  • [1] Arachchige I., 2023, P REC ADV NAT LANG P, P117, DOI [10.26615/978-954-452-092-2013, DOI 10.26615/978-954-452-092-2013]
  • [2] Bieniak J., 1976, Polski Sownik Biograficzny, V21, P151
  • [3] Brown TB, 2020, ADV NEUR IN, V33
  • [4] Foleczyska H., 1948, Polski Sownik Biograficzny, V7, P28
  • [5] Gizbert Studnicki W., 1948, Polski Sownik Biograficzny, V6, P53
  • [6] Gonzalez Garcia G., 2023, WORKSH COMP HUM RES
  • [7] Granas R., 1979, Polski Sownik Biograficzny, V24, P213
  • [8] Han RD, 2024, Arxiv, DOI [arXiv:2305.14450, DOI 10.18653/V1/2023.EMNLP-MAIN.4932305.14450]
  • [9] Handelsman M., 1938, Polski Sownik Biograficzny, V4, P257
  • [10] ChatGPT: Jack of all trades, master of none
    Kocon, Jan
    Cichecki, Igor
    Kaszyca, Oliwier
    Kochanek, Mateusz
    Szydlo, Dominika
    Baran, Joanna
    Bielaniewicz, Julita
    Gruza, Marcin
    Janz, Arkadiusz
    Kanclerz, Kamil
    Kocon, Anna
    Koptyra, Bartlomiej
    Mieleszczenko-Kowszewicz, Wiktoria
    Milkowski, Piotr
    Oleksy, Marcin
    Piasecki, Maciej
    Radlinski, Lukasz
    Wojtasik, Konrad
    Wozniak, Stanislaw
    Kazienko, Przemyslaw
    [J]. INFORMATION FUSION, 2023, 99