Automatic extraction of table metadata from digital documents

被引:0
作者
Liu, Ying [1 ]
Mitra, Prasenjit [1 ]
Giles, C. Lee [1 ]
Bai, Kun [1 ]
机构
[1] Penn State Univ, Coll Informat Sci & Technol, University Pk, PA 16802 USA
来源
OPENING INFORMATION HORIZONS | 2006年
关键词
metadata extraction; table detection; table structure recognition; searching; exchanging;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Tables are used to present, list, summarize, and structure important data in documents. In scholarly articles, they are often used to present the relationships among data and highlight a collection of results obtained from experiments and scientific analysis. In digital libraries, extracting this data automatically and understanding the structure and content of tables are very important to many applications. Automatic identification extraction, and search for the contents of tables can be made more precise with the help of metadata. In this paper, we propose a set of medium-independent table metadata to facilitate the table indexing, searching, and exchanging. To extract the contents of tables and their metadata, an automatic table metadata extraction algorithm is designed and rested on PDF documents.
引用
收藏
页码:339 / +
页数:2
相关论文
共 50 条
  • [1] Workflow of Metadata Extraction from Retro-Born Digital Documents
    Tkaczyk, Dominika
    Bolikowski, Lukasz
    DML 2011: TOWARDS A DIGITAL MATHEMATICS LIBRARY, 2011, : 39 - 44
  • [2] Automatic Metadata Extraction From Iranian Theses And Dissertations
    Rahnama, Mohadese
    Hasheminejad, Seyed Mohammad Hossein
    Nasiri, Jalal A.
    2020 6TH IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS), 2020,
  • [3] CERMINE: automatic extraction of structured metadata from scientific literature
    Dominika Tkaczyk
    Paweł Szostek
    Mateusz Fedoryszak
    Piotr Jan Dendek
    Łukasz Bolikowski
    International Journal on Document Analysis and Recognition (IJDAR), 2015, 18 : 317 - 335
  • [4] CERMINE - automatic extraction of metadata and references from scientific literature
    Tkaczyk, Dominika
    Szostek, Pawel
    Dendek, Piotr Jan
    Fedoryszak, Mateusz
    Bolikowski, Lukasz
    2014 11TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS (DAS 2014), 2014, : 217 - 221
  • [5] CERMINE: automatic extraction of structured metadata from scientific literature
    Tkaczyk, Dominika
    Szostek, Pawel
    Fedoryszak, Mateusz
    Dendek, Piotr Jan
    Bolikowski, Lukasz
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2015, 18 (04) : 317 - 335
  • [6] Vision and Natural Language for Metadata Extraction from Scientific PDF Documents: A Multimodal Approach
    Boukhers, Zeyd
    Bouabdallah, Azeddine
    2022 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL), 2022,
  • [7] Text and metadata extraction from scanned Arabic documents using support vector machines
    Qin, Wenda
    Elanwar, Randa
    Betke, Margrit
    JOURNAL OF INFORMATION SCIENCE, 2022, 48 (02) : 268 - 279
  • [8] Implicit Semantics Based Metadata Extraction and Matching of Scholarly Documents
    Jiang, Congfeng
    Liu, Junming
    Ou, Dongyang
    Wang, Yumei
    Yu, Lifeng
    JOURNAL OF DATABASE MANAGEMENT, 2018, 29 (02) : 1 - 22
  • [9] End-to-end table structure recognition and extraction in heterogeneous documents
    Kashinath, Tejas
    Jain, Twisha
    Agrawal, Yash
    Anand, Tanvi
    Singh, Sanjay
    APPLIED SOFT COMPUTING, 2022, 123
  • [10] Automatic Extraction of Dublin Core Metadata from Presidential E-records
    Underwood, William
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 1931 - 1938