Automatic extraction of table metadata from digital documents

被引：0

作者：

Liu, Ying ^{[1
]}

Mitra, Prasenjit ^{[1
]}

Giles, C. Lee ^{[1
]}

Bai, Kun ^{[1
]}

机构：

[1] Penn State Univ, Coll Informat Sci & Technol, University Pk, PA 16802 USA

来源：

OPENING INFORMATION HORIZONS | 2006年

关键词：

metadata extraction; table detection; table structure recognition; searching; exchanging;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Tables are used to present, list, summarize, and structure important data in documents. In scholarly articles, they are often used to present the relationships among data and highlight a collection of results obtained from experiments and scientific analysis. In digital libraries, extracting this data automatically and understanding the structure and content of tables are very important to many applications. Automatic identification extraction, and search for the contents of tables can be made more precise with the help of metadata. In this paper, we propose a set of medium-independent table metadata to facilitate the table indexing, searching, and exchanging. To extract the contents of tables and their metadata, an automatic table metadata extraction algorithm is designed and rested on PDF documents.

引用

页码：339 / +

页数：2

共 50 条

[1] Workflow of Metadata Extraction from Retro-Born Digital Documents
Tkaczyk, Dominika
Bolikowski, Lukasz
DML 2011: TOWARDS A DIGITAL MATHEMATICS LIBRARY, 2011, : 39 - 44
[2] Automatic Metadata Extraction From Iranian Theses And Dissertations
Rahnama, Mohadese
Hasheminejad, Seyed Mohammad Hossein
Nasiri, Jalal A.
2020 6TH IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS), 2020,
[3] CERMINE: automatic extraction of structured metadata from scientific literature
Dominika Tkaczyk
Paweł Szostek
Mateusz Fedoryszak
Piotr Jan Dendek
Łukasz Bolikowski
International Journal on Document Analysis and Recognition (IJDAR), 2015, 18 : 317 - 335
[4] CERMINE - automatic extraction of metadata and references from scientific literature
Tkaczyk, Dominika
Szostek, Pawel
Dendek, Piotr Jan
Fedoryszak, Mateusz
Bolikowski, Lukasz
2014 11TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS (DAS 2014), 2014, : 217 - 221
[5] CERMINE: automatic extraction of structured metadata from scientific literature
Tkaczyk, Dominika
Szostek, Pawel
Fedoryszak, Mateusz
Dendek, Piotr Jan
Bolikowski, Lukasz
INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2015, 18 (04) : 317 - 335
[6] Vision and Natural Language for Metadata Extraction from Scientific PDF Documents: A Multimodal Approach
Boukhers, Zeyd
Bouabdallah, Azeddine
2022 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL), 2022,
[7] Text and metadata extraction from scanned Arabic documents using support vector machines
Qin, Wenda
Elanwar, Randa
Betke, Margrit
JOURNAL OF INFORMATION SCIENCE, 2022, 48 (02) : 268 - 279
[8] Implicit Semantics Based Metadata Extraction and Matching of Scholarly Documents
Jiang, Congfeng
Liu, Junming
Ou, Dongyang
Wang, Yumei
Yu, Lifeng
JOURNAL OF DATABASE MANAGEMENT, 2018, 29 (02) : 1 - 22
[9] End-to-end table structure recognition and extraction in heterogeneous documents
Kashinath, Tejas
Jain, Twisha
Agrawal, Yash
Anand, Tanvi
Singh, Sanjay
APPLIED SOFT COMPUTING, 2022, 123
[10] Automatic Extraction of Dublin Core Metadata from Presidential E-records
Underwood, William
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 1931 - 1938

← 1 2 3 4 5 →