A Semi-automatic Data Extraction System for Heterogeneous Data Sources: a Case Study from Cotton Industry

被引:0
|
作者
Nayak, Richi [1 ,2 ]
Balasubramaniam, Thirunavukarasu [1 ,2 ]
Kutty, Sangeetha [1 ,2 ]
Banduthilaka, Sachindra [3 ]
Peterson, Erin [4 ]
机构
[1] Queensland Univ Technol, Sch Comp Sci, Brisbane, Qld, Australia
[2] Queensland Univ Technol, Ctr Data Sci, Brisbane, Qld, Australia
[3] Redeye Apps Pvt Ltd, Brisbane, Qld, Australia
[4] Erin Peterson Consulting, Brisbane, Qld, Australia
来源
DATA MINING, AUSDM 2021 | 2021年 / 1504卷
关键词
Information extraction; Focused information retrieval; Automated discovery; NER; Chunking; Unstructured data; Web;
D O I
10.1007/978-981-16-8531-6_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the recent developments in digitisation, there are increasing number of documents available online. There are several information extraction tools that are available to extract information from digitised documents. However, identifying precise answers to a given query is often a challenging task especially if the data source where the relevant information resides is unknown. This situation becomes more complex when the data source is available in multiple formats such as PDF, table and html. In this paper, we propose a novel data extraction system to discover relevant and focused information from diverse unstructured data sources based on text mining approaches. We perform a qualitative analysis to evaluate the proposed system and its suitability and adaptability using cotton industry.
引用
收藏
页码:209 / 222
页数:14
相关论文
共 45 条
  • [1] A preliminary Investigation of a Semi-Automatic Criminology Intelligence Extraction Method: A Big Data Approach
    Trovati, Marcello
    Hodgsons, Philip
    Hargreaves, Charlotte
    2015 INTERNATIONAL CONFERENCE ON INTELLIGENT NETWORKING AND COLLABORATIVE SYSTEMS IEEE INCOS 2015, 2015, : 454 - 458
  • [2] Ontology-based information extraction and integration from heterogeneous data sources
    Buitelaar, Paul
    Cimiano, Philipp
    Frank, Anette
    Hartung, Matthias
    Racloppa, Stefania
    INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES, 2008, 66 (11) : 759 - 788
  • [3] Semi-automatic Extraction of Plants Morphological Characters from Taxonomic Descriptions Written in Spanish
    Auxiliadora Mora, Maria
    Enrique Araya, Jose
    BIODIVERSITY DATA JOURNAL, 2018, 6
  • [4] A semi-automatic approach for generating geological profiles by integrating multi-source data
    Wang, Bin
    Wu, Liang
    Li, Wenjia
    Qiu, Qinjun
    Xie, Zhong
    Liu, Hao
    Zhou, Yuan
    ORE GEOLOGY REVIEWS, 2021, 134
  • [5] DESIGNING A SYSTEM FOR SEMI-AUTOMATIC POPULATION OF KNOWLEDGE BASES FROM UNSTRUCTURED TEXT
    Goldstein-Stewart, Jade
    Winder, Ransom K.
    KEOD 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE ENGINEERING AND ONTOLOGY DEVELOPMENT, 2009, : 88 - 99
  • [6] Semi-automatic Information Extraction from Discussion Boards with Applications for Anti-Spam Technology
    Sarencheh, Saeed
    Potdar, Vidyasagar
    Yeganeh, Elham Afsari
    Firoozeh, Nazanin
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2010, PT 2, PROCEEDINGS, 2010, 6017 : 370 - +
  • [7] Development of a semi-automatic bibliometric system for publications on animal health and welfare: a methodological study
    Gautret, Marjolaine
    Messori, Stefano
    Jestin, Andre
    Bagni, Marina
    Boissy, Alain
    SCIENTOMETRICS, 2017, 113 (02) : 803 - 823
  • [8] Information Extraction from Research Papers by Data Integration and Data Validation from Multiple Header Extraction Sources
    Saleem, Ozair
    Latif, Seemab
    WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE, WCECS 2012, VOL I, 2012, : 215 - 219
  • [9] Semi-Automatic Terminology Generation for Information Extraction from German Chest X-Ray Reports
    Krebs, Jonathan
    Corovic, Hamo
    Dietrich, Georg
    Ertl, Max
    Fette, Georg
    Kaspar, Mathias
    Krug, Markus
    Stoerk, Stefan
    Puppe, Frank
    GERMAN MEDICAL DATA SCIENCES: VISIONS AND BRIDGES, 2017, 243 : 80 - 84
  • [10] A multi-agent conversational system with heterogeneous data sources access
    Eisman, Eduardo M.
    Navarro, Maria
    Luis Castro, Juan
    EXPERT SYSTEMS WITH APPLICATIONS, 2016, 53 : 172 - 191