Demystifying oral history with natural language processing and data analytics: a case study of the Densho digital collection

被引:1
作者
Chen, Haihua [1 ]
Kim, Jeonghyun [1 ]
Chen, Jiangping [1 ]
Sakata, Aisa [1 ]
机构
[1] Univ North Texas, Dept Informat Sci, Denton, TX 76205 USA
关键词
Digital archives; Densho; Oral history; Natural language processing; Data analytics; JAPANESE-AMERICAN INTERNMENT; WORLD-WAR-II; EXTRACTION; DISCOURSE; LIVES;
D O I
10.1108/EL-12-2023-0303
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
PurposeThis study aims to explore the applications of natural language processing (NLP) and data analytics in understanding large-scale digital collections in oral history archives.Design/methodology/approachNLP and data analytics were used to analyse the oral interview transcripts of 904 survivors of the Japanese American incarceration camps collected from Densho Digital Repository, relying specifically on descriptive analysis, keyword extraction, topic modelling and sentiment analysis (SA).FindingsThe researchers found multiple geographic areas of large residential communities of ethnic Japanese people and the place names of the concentration camps. The keywords and topics extracted reflect the deplorable conditions and militaristic nature of the camps and the forced labour of the internees. When remembering history, the main focus for the narrators remains the redress and reparation movement to obtain the restitution of their civil rights. SA further found that the forcible removal and incarceration of Japanese Americans during Second World War negatively impacted and brought deep trauma to the narrators.Originality/valueThis case study demonstrated how NLP and data analytics could be applied to analyse oral history archives and open avenues for discovery. Archival researchers and the general public may benefit from this type of analysis in making connections between temporal, spatial and emotional elements, which will contribute to a holistic understanding of individuals and communities in terms of their collective memory.
引用
收藏
页码:643 / 663
页数:21
相关论文
共 70 条
[1]  
Abe David K., 2019, Asian Anthropology, V18, P266
[2]   Extraction of spatio-temporal data about historical events from text documents [J].
Abraham, Susanna ;
Maes, Stephan ;
Bernard, Lars .
TRANSACTIONS IN GIS, 2018, 22 (03) :677-696
[3]   BERT for Arabic Topic Modeling: An Experimental Study on BERTopic Technique [J].
Abuzayed, Abeer ;
Al-Khalifa, Hend .
AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 :191-194
[4]  
[Anonymous], 1982, PERSONAL JUSTICE DEN
[5]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[6]  
Bougouin A., 2013, P 6 INT JOINT C NAT, P543
[7]  
Boyd Douglas., 2014, ORAL HIST DIGITAL HU
[8]   Using Topic Modeling to Enhance Access to Library Digital Collections [J].
Cain, Jonathan O. .
JOURNAL OF WEB LIBRARIANSHIP, 2016, 10 (03) :210-225
[9]   YAKE! Keyword extraction from single documents using multiple local features [J].
Campos, Ricardo ;
Mangaravite, Vitor ;
Pasquali, Arian ;
Jorge, Alipio ;
Nunes, Celia ;
Jatowt, Adam .
INFORMATION SCIENCES, 2020, 509 :257-289
[10]  
Chen FJ, 2005, J SOUTHWEST, V47, P551