Metadata Enrichment of Multi-disciplinary Digital Library: A Semantic-Based Approach

被引:4
作者
Al-Natsheh, Hussein T. [1 ,2 ,3 ]
Martinet, Lucie [2 ,4 ]
Muhlenbach, Fabrice [1 ,5 ]
Rico, Fabien [1 ,6 ]
Zighed, Djamel Abdelkader [1 ,2 ]
机构
[1] Univ Lyon, Lyon, France
[2] Lyon 2, ERIC EA 3083, 5 Ave Pierre Mendes France, F-69676 Bron, France
[3] CNRS, MSH LSE, USR 2005, 14 Ave Berthelot, F-69363 Lyon 07, France
[4] CESI EXIA, LINEACT, 19 Ave Guy Collongue, F-69130 Ecully, France
[5] UJM St Etienne, CNRS, Lab Hubert Curien, UMR 5516, F-42023 St Etienne, France
[6] Lyon 1, ERIC EA 3083, 5 Ave Pierre Mendes France, F-69676 Bron, France
来源
DIGITAL LIBRARIES FOR OPEN KNOWLEDGE, TPDL 2018 | 2018年 / 11057卷
关键词
Semantic tagging; Digital libraries; Topic modeling; Multi-label classification; Metadata enrichment;
D O I
10.1007/978-3-030-00066-0_3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the scientific digital libraries, some papers from different research communities can be described by community-dependent keywords even if they share a semantically similar topic. Articles that are not tagged with enough keyword variations are poorly indexed in any information retrieval system which limits potentially fruitful exchanges between scientific disciplines. In this paper, we introduce a novel experimentally designed pipeline for multi-label semantic-based tagging developed for open-access metadata digital libraries. The approach starts by learning from a standard scientific categorization and a sample of topic tagged articles to find semantically relevant articles and enrich its metadata accordingly. Our proposed pipeline aims to enable researchers reaching articles from various disciplines that tend to use different terminologies. It allows retrieving semantically relevant articles given a limited known variation of search terms. In addition to achieving an accuracy that is higher than an expanded query based method using a topic synonym set extracted from a semantic network, our experiments also show a higher computational scalability versus other comparable techniques. We created a new benchmark extracted from the open-access metadata of a scientific digital library and published it along with the experiment code to allow further research in the topic.
引用
收藏
页码:32 / 43
页数:12
相关论文
共 19 条
  • [1] LIS journals scientific impact and subject categorization: a comparison between Web of Science and Scopus
    Abrizah, A.
    Zainab, A. N.
    Kiran, K.
    Raj, R. G.
    [J]. SCIENTOMETRICS, 2013, 94 (02) : 721 - 740
  • [2] [Anonymous], CIDR 2015
  • [3] Blei D.M., 2007, P 20 INT C NEUR INF, P121, DOI DOI 10.5555/2981562.2981578
  • [4] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [5] Bodrunova Svetlana, 2013, Advances in Artificial Intelligence and Its Applications. 12th Mexican International Conference on Artificial Intelligence, MICAI 2013. Proceedings: LNCS 8265, P265, DOI 10.1007/978-3-642-45114-0_21
  • [6] Bojanowski P, 2017, Transactions of the Association for Computational Linguistics, V5, P135, DOI [10.1162/tacla00051, DOI 10.1162/TACLA00051, 10.1162/tacl_a_00051]
  • [7] Borgida A, 1991, Morgan Kaufmann Series in Representation and Reasoning
  • [8] Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions
    Halko, N.
    Martinsson, P. G.
    Tropp, J. A.
    [J]. SIAM REVIEW, 2011, 53 (02) : 217 - 288
  • [9] DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia
    Lehmann, Jens
    Isele, Robert
    Jakob, Max
    Jentzsch, Anja
    Kontokostas, Dimitris
    Mendes, Pablo N.
    Hellmann, Sebastian
    Morsey, Mohamed
    van Kleef, Patrick
    Auer, Soeren
    Bizer, Christian
    [J]. SEMANTIC WEB, 2015, 6 (02) : 167 - 195
  • [10] Liang F., 2015, LARGE SCALE TOPIC MO