Automatic extraction of table metadata from digital documents

被引:0
作者
Liu, Ying [1 ]
Mitra, Prasenjit [1 ]
Giles, C. Lee [1 ]
Bai, Kun [1 ]
机构
[1] Penn State Univ, Coll Informat Sci & Technol, University Pk, PA 16802 USA
来源
OPENING INFORMATION HORIZONS | 2006年
关键词
metadata extraction; table detection; table structure recognition; searching; exchanging;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Tables are used to present, list, summarize, and structure important data in documents. In scholarly articles, they are often used to present the relationships among data and highlight a collection of results obtained from experiments and scientific analysis. In digital libraries, extracting this data automatically and understanding the structure and content of tables are very important to many applications. Automatic identification extraction, and search for the contents of tables can be made more precise with the help of metadata. In this paper, we propose a set of medium-independent table metadata to facilitate the table indexing, searching, and exchanging. To extract the contents of tables and their metadata, an automatic table metadata extraction algorithm is designed and rested on PDF documents.
引用
收藏
页码:339 / +
页数:2
相关论文
共 50 条
  • [41] Deep Neural Networks for Automated Metadata Extraction
    El Omari, Abdellah
    Antari, Jilali
    Elkina, Hamza
    DIGITAL TECHNOLOGIES AND APPLICATIONS, ICDTA 2024, VOL 4, 2024, 1101 : 64 - 74
  • [42] Dredging a Data Lake: Decentralized Metadata Extraction
    Skluzacek, Tyler J.
    MIDDLEWARE'19: PROCEEDINGS OF THE 2019 20TH INTERNATIONAL MIDDLEWARE CONFERENCE DOCTORAL SYMPOSIUM, 2019, : 51 - 53
  • [43] Document Classification Based on Metadata and Keywords Extraction
    Rezqa, Eman Y.
    Baraka, Rebhi S.
    2021 PALESTINIAN INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (PICICT 2021), 2021, : 18 - 24
  • [44] A Novel Metadata Extraction Method for Surveillance Video
    Zheng, Ran
    Chen, Long
    Jin, Hai
    Zhu, Lei
    Zhang, Qin
    2016 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS (ICNC), 2016,
  • [45] Scientific Literature Metadata Extraction Based on HMM
    Cui, Binge
    COOPERATIVE DESIGN, VISUALIZATION, AND ENGINEERING, PROCEEDINGS, 2009, 5738 : 64 - 68
  • [46] An Improved Hidden Markov Model for Literature Metadata Extraction
    Cui, Bin-Ge
    Chen, Xin
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, 2010, 6215 : 205 - 212
  • [47] Skluma: An extensible metadata extraction pipeline for disorganized data
    Skluzacek, Tyler J.
    Kumar, Rohan
    Chard, Ryan
    Harrison, Galen
    Beckman, Paul
    Chard, Kyle
    Foster, Ian T.
    2018 IEEE 14TH INTERNATIONAL CONFERENCE ON E-SCIENCE (E-SCIENCE 2018), 2018, : 256 - 266
  • [48] Metadata categorization for identifying search patterns in a digital library
    Bogaard, Tessel
    Hollink, Laura
    Wielemaker, Jan
    van Ossenbruggen, Jacco
    Hardman, Lynda
    JOURNAL OF DOCUMENTATION, 2019, 75 (02) : 270 - 286
  • [49] Towards Tabular Data Extraction From Richly-Structured Documents Using Supervised and Weakly-Supervised Learning
    Chowdhury, Arnab Ghosh
    ben Ahmed, Martin
    Atzmueller, Martin
    2022 IEEE 27TH INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION (ETFA), 2022,
  • [50] Intelligent Frame and Table Segmentation in Blueprint Documents: Method and Implementation
    Hasan, Fudail
    Kashevnik, Alexey
    DATA SCIENCE AND ALGORITHMS IN SYSTEMS, 2022, VOL 2, 2023, 597 : 837 - 845