Rule Driven Spreadsheet Data Extraction from Statistical Tables: Case Study

被引:3
|
作者
Paramonov, Viacheslav [1 ,2 ]
Shigarov, Alexey [1 ,2 ]
Vetrova, Varvara [1 ,3 ]
机构
[1] Russian Acad Sci, Matrosov Inst Syst Dynam & Control Theory, Siberian Branch, Irkutsk, Russia
[2] Irkutsk State Univ, Inst Math & Informat Technol, Irkutsk, Russia
[3] Univ Canterbury, Sch Math & Stat, Christchurch, New Zealand
来源
INFORMATION AND SOFTWARE TECHNOLOGIES, ICIST 2021 | 2021年 / 1486卷
基金
俄罗斯科学基金会;
关键词
Table understanding; Data transformation; Table extraction; Table analysis; Spreadsheet; Table header; Heuristics; Case study; Rules;
D O I
10.1007/978-3-030-88304-1_7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Spreadsheet tables are one of the most commonly used formats to organise and store sets of statistical, financial, accounting and other types of data. This form of data representation is widely used in science, education, engineering, and business. The key feature of spreadsheet tables that they are generally created by people in order to be further used by other people rather than by automated programs. During spreadsheet creation, commonly, no consideration is given to the possibility of further automated data processing. This leads to a large variety of possible spreadsheet table structures and further complicates automated extraction of table content and table understanding. One of the key factors that influence on the quality of table understanding by machines is the correctness of the header structure, for example, position and relation between cells. In this paper, we present a case study of a tabular data extraction approach and estimate its performance on a variety of datasets. The rule-driven software platform TabbyXL was used for tabular data extraction and canonicalisation. The experiment was conducted on real-world tables of SAUS200 (The 2010 Statistical Abstract of the United States) corpora. For the evaluation, we used spreadsheet tables as they are presented in SAUS; the same tables, but with an automatically corrected header structure; and tables where the structure of the header was corrected by experts. The case study results demonstrate the importance of header structure correctness for automated table processing and understanding. The ground-truth preparation procedures, example of rules describing relationships between table elements, and results of the evaluation are presented in the paper.
引用
收藏
页码:84 / 95
页数:12
相关论文
共 50 条
  • [11] A MULTI-DRIVEN APPROACH TO REQUIREMENTS ANALYSIS OF DATA WAREHOUSE MODEL: A CASE STUDY
    Oliveira e Sa, Jorge
    Kaldeich, Claus
    Carvalho, Joao Alvaro
    IADIS-INTERNATIONAL JOURNAL ON COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2013, 8 (01): : 14 - 30
  • [12] Data-Driven Analysis for RFID-Enabled Smart Factory: A Case Study
    Feng, Jiqiang
    Li, Feipeng
    Xu, Chen
    Zhong, Ray Y.
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2020, 50 (01): : 81 - 88
  • [13] Towards Analyzing High Street Customer Trajectories - A Data-Driven Case Study
    Berendes, C. Ingo
    BUSINESS INFORMATION SYSTEMS WORKSHOPS, BIS 2019, 2019, 373 : 313 - 324
  • [14] Data-Driven Decision Making in Maintenance Service Delivery Process: A Case Study
    Sala, Roberto
    Pirola, Fabiana
    Pezzotta, Giuditta
    Cavalieri, Sergio
    APPLIED SCIENCES-BASEL, 2022, 12 (15):
  • [15] Case Study on Data-driven Deployment of Program Analysis on an Open Tools Stack
    Ljungberg, Anton
    Akerman, David
    Soderberg, Emma
    Lundh, Gustaf
    Sten, Jon
    Church, Luke
    2021 IEEE/ACM 43RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING IN PRACTICE (ICSE-SEIP 2021), 2021, : 208 - 217
  • [16] Rule extraction from expert heuristics: A comparative study of rough sets with neural networks and ID3
    Mak, B
    Munakata, T
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2002, 136 (01) : 212 - 229
  • [17] Supervised intelligent prediction of shear strength of rockfill materials based on data driven and a case study
    Li, Chuanqi
    Zhang, Jiamin
    Mei, Xiancheng
    Zhou, Jian
    TRANSPORTATION GEOTECHNICS, 2024, 45
  • [18] Data-driven model for maintenance decision support: A case study of railway signalling systems
    Morant, Amparo
    Larsson-Kraik, Per-Olof
    Kumar, Uday
    PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART F-JOURNAL OF RAIL AND RAPID TRANSIT, 2016, 230 (01) : 220 - 234
  • [19] A Pragmatic Framework for Data-Driven Decision-Making Process in the Energy Sector: Insights from a Wind Farm Case Study
    Konstas, Konstantinos
    Chountalas, Panos T.
    Didaskalou, Eleni A.
    Georgakellos, Dimitrios A.
    ENERGIES, 2023, 16 (17)
  • [20] Big Data-Driven Business Model Innovation (BMI) From the Perspective of Ambidexterity: Case Study of L. Vending Intelligence
    Wang, Lixia
    Boasson, Vigdis W.
    Boasson, Emil
    Liu, Yan
    Chen, Ying
    JOURNAL OF GLOBAL INFORMATION MANAGEMENT, 2023, 31 (01)