Rule-Based Canonicalization of Arbitrary Tables in Spreadsheets

被引:11
作者
Shigarov, Alexey O. [1 ]
Paramonov, Viacheslav V. [1 ]
Belykh, Polina V. [1 ]
Bondarev, Alexander I. [1 ]
机构
[1] RAS, Matrosov Inst Syst Dynam & Control Theory, SB, Irkutsk, Russia
来源
INFORMATION AND SOFTWARE TECHNOLOGIES, ICIST 2016 | 2016年 / 639卷
基金
俄罗斯基础研究基金会;
关键词
Unstructured data integration; Table understanding; Table analysis and interpretation; Spreadsheet data transformation; ONTOLOGY GENERATION; WEB;
D O I
10.1007/978-3-319-46254-7_7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Arbitrary tables presented in spreadsheets can be an important data source in business intelligence. However, many of them have complex layouts that hinder the process of extracting, transforming, and loading their data in a database. The paper is devoted to the issues of rule-based data transformation from arbitrary tables presented in spreadsheets to a structured canonical form that can be loaded into a database by regular ETL-tools. We propose a system for canonicalization of arbitrary tables presented in spreadsheets as an implementation of our methodology for rule-based table analysis and interpretation. It enables the execution of rules expressed in our specialized rule language called CRL to recover implicit relationships in a table. Our experimental results show that particular CRL-programs can be developed for different sets of tables with similar features to automate table canonicalization with high accuracy.
引用
收藏
页码:78 / 91
页数:14
相关论文
共 28 条
[11]   Automating the extraction of data from HTML']HTML tables with unknown structure [J].
Embley, DW ;
Tao, C ;
Liddle, SW .
DATA & KNOWLEDGE ENGINEERING, 2005, 54 (01) :3-28
[12]   Identifying Web Tables: Supporting a Neglected Type of Content on the Web [J].
Galkin, Mikhail ;
Mouromtsev, Dmitry ;
Auer, Soeren .
KNOWLEDGE ENGINEERING AND SEMANTIC WEB, KESW 2015, 2015, 518 :48-62
[13]  
Gatterbauer W., 2007, P 16 INT C WORLD WID, P71, DOI DOI 10.1145/1242572.1242583
[14]  
Govindaraju V., 2013, P 51 ANN M ASS COMP, P658
[15]  
Hung V., 2011, THESIS
[16]  
Hung Vu., 2011, CIKM, P1749, DOI DOI 10.1145/2063576.2063829
[17]  
Jingjing Wang, 2012, Conceptual Modeling. Proceedings 31st International Conference, ER 2012, P141, DOI 10.1007/978-3-642-34002-4_11
[18]   Extracting logical structures from HTML']HTML tables [J].
Kim, Yeon-Seok ;
Lee, Kyong-Ho .
COMPUTER STANDARDS & INTERFACES, 2008, 30 (05) :296-308
[19]   Extracting statistics indicators from tables of basic structure [J].
Kudinov P.Y. .
Pattern Recognition and Image Analysis, 2011, 21 (04) :630-636
[20]   End-to-End Conversion of HTML']HTML Tables for Populating a Relational Database [J].
Nagy, George ;
Embley, David W. ;
Seth, Sharad .
2014 11TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS (DAS 2014), 2014, :222-226