Time Period Categorization in Fiction: A Comparative Analysis of Machine Learning Techniques

被引:0
作者
Westin, Fereshta [1 ,2 ]
机构
[1] Univ Boras, Boras, Sweden
[2] Univ Boras, Allegatan 1, Boras, Sweden
关键词
Cataloging for digital resources; time period categorization; machine learning; text analysis; fiction; LDA; SBERT; TF-IDF; CLASSIFICATION;
D O I
10.1080/01639374.2024.2315548
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
This study investigates the automatic categorization of time period metadata in fiction, a critical but often overlooked aspect of cataloging. Using a comparative analysis approach, the performance of three machine learning techniques, namely Latent Dirichlet Allocation (LDA), Sentence-BERT (SBERT), and Term Frequency-Inverse Document Frequency (TF-IDF) were assessed, by examining their precision, recall, F1 scores, and confusion matrix results. LDA identifies underlying topics within the text, TF-IDF measures word importance, and SBERT measures sentence semantic similarity. Based on F1-score analysis and confusion matrix outcomes, TF-IDF and LDA effectively categorize text data by time period, while SBERT performed poorly across all time period categories.
引用
收藏
页码:124 / 153
页数:30
相关论文
共 39 条
  • [1] Time
    Adam, Barbara
    [J]. THEORY CULTURE & SOCIETY, 2006, 23 (2-3) : 119 - 126
  • [2] Fiction in a Phenomenon-Based Classification
    Almeida, Patricia de
    Gnoli, Claudio
    [J]. CATALOGING & CLASSIFICATION QUARTERLY, 2021, 59 (05) : 477 - 491
  • [3] [Anonymous], 2010, P 5 INT WORKSH SEM E
  • [4] BATES MJ, 1993, LIBR QUART, V63, P1, DOI 10.1086/602526
  • [5] Bergsten Staffan, 2004, LITT GRUNDB, P1
  • [6] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [7] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [8] Brivio Matteo, 2020, EVALITA EVALUATION N, P398
  • [9] Condor A., 2021, P 14 INT C ED DAT MI
  • [10] Encyclopedia of Machine Learning, ENC MACH LEARN, DOI [10.1007/978-0-387-30164-8_832, DOI 10.1007/978-0-387-30164-8_832]