Computational Curation and the Application of Large-Scale Vocabularies

被引:0
|
作者
Grabus, Sam [1 ]
Greenberg, Jane [1 ]
机构
[1] Drexel Univ, Coll Comp & Informat, MRC, Philadelphia, PA 19104 USA
来源
2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2021年
关键词
controlled vocabularies; stemming; lemmatization; natural language processing (NLP); automatic curation;
D O I
10.1109/BigData52589.2021.9671611
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Paper presents an exploratory case study comparing stemming and lemmatization results for the automatic application of large-scale controlled vocabularies processed against archival encyclopedia entries. The results report relative recall and precision evaluations across both results. Research shows that while stemming has a higher relative recall, lemmatization results in a higher relevance score and eliminates the over-stemming challenges. Results provide insight into improving automatic curation workflows for archival resources.
引用
收藏
页码:2220 / 2223
页数:4
相关论文
共 12 条
  • [1] Library of Congress Controlled Vocabularies and Their Application to the Semantic Web
    Harper, Corey A.
    Tillett, Barbara B.
    CATALOGING & CLASSIFICATION QUARTERLY, 2007, 43 (3-4) : 47 - 68
  • [2] A cost-effective lexical acquisition process for large-scale thesaurus translation
    Lin, Jimmy
    Murray, G. Craig
    Dorr, Bonnie J.
    Hajic, Jan
    Pecina, Pavel
    LANGUAGE RESOURCES AND EVALUATION, 2009, 43 (01) : 27 - 40
  • [3] A cost-effective lexical acquisition process for large-scale thesaurus translation
    Jimmy Lin
    G. Craig Murray
    Bonnie J. Dorr
    Jan Hajič
    Pavel Pecina
    Language Resources and Evaluation, 2009, 43 : 27 - 40
  • [4] SKOS application for interoperability of controlled vocabularies in the field of linked open data
    Pastor-Sanchez, Juan-Antonio
    Martinez-Mendez, Francisco-Javier
    Rodriguez-Munoz, Jose-Vicente
    PROFESIONAL DE LA INFORMACION, 2012, 21 (03): : 245 - 253
  • [5] NLP-Fast: A Fast, Scalable, and Flexible System to Accelerate Large-Scale Heterogeneous NLP Models
    Kim, Joonsung
    Hur, Suyeon
    Lee, Eunbok
    Lee, Seungho
    Kim, Jangwoo
    30TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT 2021), 2021, : 75 - 89
  • [6] Application of Lemmatization and Summarization Methods in Topic Identification Module for Large Scale Language Modeling Data Filtering
    Skorkovska, Lucie
    TEXT, SPEECH AND DIALOGUE, TSD 2012, 2012, 7499 : 191 - 198
  • [7] Agenda-Setting for COVID-19: A Study of Large-Scale Economic News Coverage Using Natural Language Processing
    Guang Lu
    Martin Businger
    Christian Dollfus
    Thomas Wozniak
    Matthes Fleck
    Timo Heroth
    Irina Lock
    Janna Lipenkova
    International Journal of Data Science and Analytics, 2023, 15 : 291 - 312
  • [8] Agenda-Setting for COVID-19: A Study of Large-Scale Economic News Coverage Using Natural Language Processing
    Lu, Guang
    Businger, Martin
    Dollfus, Christian
    Wozniak, Thomas
    Fleck, Matthes
    Heroth, Timo
    Lock, Irina
    Lipenkova, Janna
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2023, 15 (03) : 291 - 312
  • [9] Introduction of a Sectioned Genetic Algorithm for Large Scale Problems
    Detorakis, Zacharias
    Tambouratzis, George
    2007 2ND BIO-INSPIRED MODELS OF NETWORKS, INFORMATION AND COMPUTING SYSTEMS (BIONETICS), 2007, : 1 - 6
  • [10] Large scale summarization using ensemble prompts and in context learning approaches
    Leiva-Araos, Andres
    Gana, Bady
    Allende-Cid, Hector
    Garcia, Jose
    Saikia, Manob Jyoti
    SCIENTIFIC REPORTS, 2025, 15 (01):