A Metadata Extractor for Books in a Digital Library

被引:0
作者
Akhtar, Sk Simran [1 ]
Sanyal, Debarshi Kumar [2 ]
Chattopadhyay, Samiran [1 ]
Bhowmick, Plaban Kumar [2 ]
Das, Partha Pratim [2 ]
机构
[1] Jadavpur Univ, Kolkata 700098, W Bengal, India
[2] Indian Inst Technol Kharagpur, Kharagpur 721302, W Bengal, India
来源
MATURITY AND INNOVATION IN DIGITAL LIBRARIES, ICADL 2018 | 2018年 / 11279卷
关键词
Metadata extraction; Digital Library; Rule-based system; RECOGNITION;
D O I
10.1007/978-3-030-04257-8_33
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Books form a significant part of the National Digital Library of India (NDLI). However, extracting metadata from these books is difficult owing to variations in style, graphic fonts, and use of background images. This paper presents a lightweight tool to automatically extract metadata from academic books. We also describe results of a preliminary evaluation of our tool on school books indexed in NDLI.
引用
收藏
页码:323 / 327
页数:5
相关论文
共 8 条
  • [1] Constantin Alexandru., 2013, Proceedings of the 2013 ACM symposium on Document engineering, P177, DOI DOI 10.1145/2494266.2494271
  • [2] Lopez P, 2009, LECT NOTES COMPUT SC, V5714, P473, DOI 10.1007/978-3-642-04346-8_62
  • [3] Quoc NH, 2009, LECT NOTES COMPUT SC, V5754, P386, DOI 10.1007/978-3-642-04070-2_44
  • [4] Sobottka K., 2000, International Journal on Document Analysis and Recognition, V2, P163
  • [5] Machine Learning vs. Rules and Out-of-the-Box vs. Retrained: An Evaluation of Open-Source Bibliographic Reference and Citation Parsers
    Tkaczyk, Dominika
    Collins, Andrew
    Sheridan, Paraic
    Beel, Joeran
    [J]. JCDL'18: PROCEEDINGS OF THE 18TH ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES, 2018, : 99 - 108
  • [6] Waltinger U, 2011, LECT NOTES COMPUT SC, V6699, P29
  • [7] Wu JH, 2015, AER ADV ENG RES, V9, P13
  • [8] Yang X, 2017, ACM-IEEE J CONF DIG, P245