Transforming a nonstandard table into formalized tables

被引:2
作者
Su, Huili [1 ]
Li, Yukun [1 ,2 ]
Wang, Xiaoye [1 ,2 ]
Hao, Gang [1 ,2 ]
Lai, Yongxuan [3 ]
Wang, Weiwei [4 ]
机构
[1] Tianjin Univ Technol, Tianjin, Peoples R China
[2] Key Lab Intelligence Comp & Novel Software Techno, Tianjin, Peoples R China
[3] Xiamen Univ, Software Sch, Xiamen, Peoples R China
[4] Offshore Oil Engn Co Ltd, Tianjin, Peoples R China
来源
2017 14TH WEB INFORMATION SYSTEMS AND APPLICATIONS CONFERENCE (WISA 2017) | 2017年
关键词
Information Extraction; Relational Tables; 1NF; !text type='HTML']HTML[!/text;
D O I
10.1109/WISA.2017.38
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Tables and spreadsheets on the Internet often contain valuable information, but are created by people who have different individuation. As a result, the similar data are often issued with different structures. This limits the integration of such tables. This paper aims to overcome this problem by automatically analyzing the structure area and propose the method transforming the tables into formal relational tables. We propose the methods on identifying structure area, modeling the table structure based on tree and methods to generate the 1NF schema of the original table. We proved the correctness of the method in semantic and the experiment results with tables from different areas demonstrate the effectiveness of our method.
引用
收藏
页码:311 / 316
页数:6
相关论文
共 13 条
[1]   S2CX: From relational data via SQL/XML to (Un-)Compressed XML [J].
Boettcher, Stefan ;
Hartel, Rita ;
Wolters, Dennis .
INFORMATION SYSTEMS, 2016, 56 :198-213
[2]   Integrating Spreadsheet Data via Accurate and Low-Effort Extraction [J].
Chen, Zhe ;
Cafarella, Michael .
PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, :1126-1135
[3]  
Jung SW., 2006, SCALABLE HYBRID APPR
[4]   Extracting logical structures from HTML']HTML tables [J].
Kim, Yeon-Seok ;
Lee, Kyong-Ho .
COMPUTER STANDARDS & INTERFACES, 2008, 30 (05) :296-308
[5]  
Li SJ, 2004, LECT NOTES COMPUT SC, V3129, P714
[6]  
Li SJ, 2004, FOURTH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY, PROCEEDINGS, P315
[7]  
Li SJ, 2004, LECT NOTES COMPUT SC, V3007, P899
[8]   From tables to frames [J].
Pivk, A ;
Cimiano, P ;
Sure, Y .
JOURNAL OF WEB SEMANTICS, 2005, 3 (2-3) :132-146
[9]   ESPRESSO: Explaining Relationships between Entity Sets [J].
Seufert, Stephan ;
Berberich, Klaus ;
Bedathur, Srikanta J. ;
Kondreddi, Sarath Kumar ;
Ernst, Patrick ;
Weikum, Gerhard .
CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, :1311-1320
[10]  
Tao C., 2006, P 3 BIOT BIOINF S, P116