A Semi-automatic Data Extraction System for Heterogeneous Data Sources: a Case Study from Cotton Industry

被引:0
|
作者
Nayak, Richi [1 ,2 ]
Balasubramaniam, Thirunavukarasu [1 ,2 ]
Kutty, Sangeetha [1 ,2 ]
Banduthilaka, Sachindra [3 ]
Peterson, Erin [4 ]
机构
[1] Queensland Univ Technol, Sch Comp Sci, Brisbane, Qld, Australia
[2] Queensland Univ Technol, Ctr Data Sci, Brisbane, Qld, Australia
[3] Redeye Apps Pvt Ltd, Brisbane, Qld, Australia
[4] Erin Peterson Consulting, Brisbane, Qld, Australia
来源
DATA MINING, AUSDM 2021 | 2021年 / 1504卷
关键词
Information extraction; Focused information retrieval; Automated discovery; NER; Chunking; Unstructured data; Web;
D O I
10.1007/978-981-16-8531-6_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the recent developments in digitisation, there are increasing number of documents available online. There are several information extraction tools that are available to extract information from digitised documents. However, identifying precise answers to a given query is often a challenging task especially if the data source where the relevant information resides is unknown. This situation becomes more complex when the data source is available in multiple formats such as PDF, table and html. In this paper, we propose a novel data extraction system to discover relevant and focused information from diverse unstructured data sources based on text mining approaches. We perform a qualitative analysis to evaluate the proposed system and its suitability and adaptability using cotton industry.
引用
收藏
页码:209 / 222
页数:14
相关论文
共 45 条
  • [31] Topology Reduction and Probabilistic Information Extraction for Large Data-Sets: A Disaster Management Case Study
    Trovati, Marcello
    Asimakopoulou, Eleana
    Bessis, Nik
    2015 2ND INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES FOR DISASTER MANAGEMENT (ICT-DM), 2015, : 116 - 121
  • [32] Ontology-Based Correlation Detection Among Heterogeneous Data Sets: A Case Study of University Campus Issues
    Tsukagoshi, Yuto
    Egami, Shusaku
    Sei, Yuichi
    Tahara, Yasuyuki
    Ohsuga, Akihiko
    2020 IEEE THIRD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND KNOWLEDGE ENGINEERING (AIKE 2020), 2020, : 33 - 40
  • [33] Computer Vision-Based Framework for Data Extraction From Heterogeneous Financial Tables: A Comprehensive Approach to Unlocking Financial Insights
    Khandokar, Iftakhar Ali
    Deshpande, Priya
    IEEE ACCESS, 2025, 13 : 17706 - 17723
  • [34] Automatic hypothesis checking using eScience research infrastructures, ontologies, and linked data: a case study in climate change research
    Lappalainen, Jaakko
    Sicilia, Miguel-Angel
    Hernandez, Bernabe
    2013 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, 2013, 18 : 1172 - 1178
  • [35] Hybrid System for Information Extraction from Social Media Text: Drug Abuse Case Study
    Jenhani, Ferdaous
    Gouider, Mohamed Salah
    Ben Said, Lamjed
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KES 2019), 2019, 159 : 688 - 697
  • [36] Automated Data Model Generation From Textual Specifications: A Case Study of ECHONET Lite Specification
    Pham, Van Cu
    Linh, Nguyen Thi Dieu
    Le, Tung
    Nguyen, Tien Huy
    Tan, Yasuo
    IEEE ACCESS, 2023, 11 : 138316 - 138324
  • [37] An integrated framework for flood disaster information extraction and analysis leveraging social media data: A case study of the Shouguang flood in China
    Hou, Huawei
    Shen, Li
    Jia, Jianan
    Xu, Zhu
    SCIENCE OF THE TOTAL ENVIRONMENT, 2024, 949
  • [38] A Methodology for Open Information Extraction and Representation from Large Scientific Corpora: The CORD-19 Data Exploration Use Case
    Papadopoulos, Dimitris
    Papadakis, Nikolaos
    Litke, Antonis
    APPLIED SCIENCES-BASEL, 2020, 10 (16):
  • [39] Methodology for the Collection and Analysis of Real Estate Data Using Alternative Sources: Case Study in Three Medium-Sized Cities of Colombia
    Rosso-Mateus, Andres E.
    Montilla-Montilla, Yeimy M.
    Garzon-Martinez, Sonia C.
    INGENIERIA, 2022, 27 (03): : 1 - 23
  • [40] Analyzing the Reliability of Unstructured Data for Urban Rainfall Pattern Studies-A Case Study from Zhengzhou
    Lv, Cuimei
    Niu, Zhaoying
    Ling, Minhua
    Wu, Zening
    Li, Yang
    Yan, Denghua
    WATER, 2022, 14 (20)