A Semi-automatic Data Extraction System for Heterogeneous Data Sources: a Case Study from Cotton Industry

被引：0

作者：

Nayak, Richi ^{[1
,2
]}

Balasubramaniam, Thirunavukarasu ^{[1
,2
]}

Kutty, Sangeetha ^{[1
,2
]}

Banduthilaka, Sachindra ^{[3
]}

Peterson, Erin ^{[4
]}

机构：

[1] Queensland Univ Technol, Sch Comp Sci, Brisbane, Qld, Australia

[2] Queensland Univ Technol, Ctr Data Sci, Brisbane, Qld, Australia

[3] Redeye Apps Pvt Ltd, Brisbane, Qld, Australia

[4] Erin Peterson Consulting, Brisbane, Qld, Australia

来源：

DATA MINING, AUSDM 2021 | 2021年 / 1504卷

关键词：

Information extraction; Focused information retrieval; Automated discovery; NER; Chunking; Unstructured data; Web;

D O I：

10.1007/978-981-16-8531-6_15

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the recent developments in digitisation, there are increasing number of documents available online. There are several information extraction tools that are available to extract information from digitised documents. However, identifying precise answers to a given query is often a challenging task especially if the data source where the relevant information resides is unknown. This situation becomes more complex when the data source is available in multiple formats such as PDF, table and html. In this paper, we propose a novel data extraction system to discover relevant and focused information from diverse unstructured data sources based on text mining approaches. We perform a qualitative analysis to evaluate the proposed system and its suitability and adaptability using cotton industry.

引用

页码：209 / 222

页数：14

共 45 条

[21] Information Extraction System for Transforming Unstructured Text Data in Fire Reports into Structured Forms: A Polish Case Study
Mironczuk, Marcin Michal
FIRE TECHNOLOGY, 2020, 56 (02) : 545 - 581
[22] Information Extraction System for Transforming Unstructured Text Data in Fire Reports into Structured Forms: A Polish Case Study
Marcin Michał Mirończuk
Fire Technology, 2020, 56 : 545 - 581
[23] Semi-automatic Software Feature-Relevant Information Extraction from Natural Language User Manuals An Approach and Practical Experience at Roche Diagnostics GmbH
Quirchmayr, Thomas
Paech, Barbara
Kohl, Roland
Karey, Hannes
REQUIREMENTS ENGINEERING: FOUNDATION FOR SOFTWARE QUALITY, REFSQ 2017, 2017, 10153 : 255 - 272
[24] Generating finite-state transducers for semi-structured data extraction from the Web
Hsu, CN
Dung, MT
INFORMATION SYSTEMS, 1998, 23 (08) : 521 - 538
[25] An analytical study of information extraction from unstructured and multidimensional big data
Adnan, Kiran
Akbar, Rehan
JOURNAL OF BIG DATA, 2019, 6 (01)
[26] An analytical study of information extraction from unstructured and multidimensional big data
Kiran Adnan
Rehan Akbar
Journal of Big Data, 6
[27] Visual Descriptor Extraction from Patent Figure Captions: A Case Study of Data Efficiency Between BiLSTM and Transformer
Wei, Xin
Wu, Jian
Ajayi, Kehinde
Oyen, Diane
2022 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL), 2022,
[28] Automatic construction of POI address lists at city streets from geo-tagged photos and web data: a case study of San Jose City
Thanh-Hieu Bui
Multimedia Tools and Applications, 2023, 82 : 34749 - 34770
[29] Automatic construction of POI address lists at city streets from geo-tagged photos and web data: a case study of San Jose City
Bui, Thanh-Hieu
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (22) : 34749 - 34770
[30] Multi-modal Information Extraction from Text, Semi-structured, and Tabular Data on the Web
Dong, Xin Luna
Hajishirzi, Hannaneh
Lockard, Colin
Shiralkar, Prashant
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 3543 - 3544

← 1 2 3 4 5 →