More of that, please: Domain Adaptation of Information Extraction through Examples & Feedback

被引：0

作者：

Haettasch, Benjamin ^{[1
]}

Binnig, Carsten ^{[1
]}

机构：

[1] Tech Univ Darmstadt, DFKI, Darmstadt, Germany

来源：

WORKSHOP ON HUMAN-IN-THE-LOOP DATA ANALYTICS, HILDA 2024 | 2024年

关键词：

D O I：

10.1145/3665939.3665966

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Automatic information extraction, e.g., into a tabular format, is crucial for leveraging knowledge in large text collections. Yet, creating such extraction pipelines for custom target attributes can cause high overheads, while off-the-shelf tools might miss domain-specific information. Therefore, in this paper, we propose an interactive system that augments generic extractions and aligns them with a target definition. The necessary domain adaptation is reached through examples provided by the users during the interaction with the system. As part of this, we propose different low-overhead extractors and evaluate them individually and end-to-end to demonstrate how our approach minimizes the necessary interactions. We publish our code as open source.

引用

页数：7

共 16 条

[1] Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes
Arora, Simran
Yang, Brandon
Eyuboglu, Sabri
Narayan, Avanika
Hojel, Andrew
Trummer, Immanuel
Re, Christopher
[J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 17 (02): : 92 - 105
[2] Douze M, 2024, Arxiv, DOI arXiv:2401.08281
[3] Elsahar H, 2018, PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), P3448
[4] Fang Z, 2021, 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), P198
[5] Ferreira M, 2020, Arxiv, DOI arXiv:2012.14235
[6] Demonstrating ASET: Ad-hoc Structured Exploration of Text Collections
Haettasch, Benjamin
Bodensohn, Jan-Micha
Binnig, Carsten
[J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 2393 - 2396
[7] Hattasch Benjamin, 2023, LNI, VP- 331, DOI [10.18420/BTW2023-08, DOI 10.18420/BTW2023-08]
[8] Hattasch Benjamin, 2021, CEUR Work- 2950, P179
[9] A Survey on Deep Learning for Named Entity Recognition
Li, Jing
Sun, Aixin
Han, Jianglei
Li, Chenliang
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (01) : 50 - 70
[10] Ling Xiao, 2012, P 26 AAAI C ART INT

← 1 2 →