More of that, please: Domain Adaptation of Information Extraction through Examples & Feedback

被引:0
作者
Haettasch, Benjamin [1 ]
Binnig, Carsten [1 ]
机构
[1] Tech Univ Darmstadt, DFKI, Darmstadt, Germany
来源
WORKSHOP ON HUMAN-IN-THE-LOOP DATA ANALYTICS, HILDA 2024 | 2024年
关键词
D O I
10.1145/3665939.3665966
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic information extraction, e.g., into a tabular format, is crucial for leveraging knowledge in large text collections. Yet, creating such extraction pipelines for custom target attributes can cause high overheads, while off-the-shelf tools might miss domain-specific information. Therefore, in this paper, we propose an interactive system that augments generic extractions and aligns them with a target definition. The necessary domain adaptation is reached through examples provided by the users during the interaction with the system. As part of this, we propose different low-overhead extractors and evaluate them individually and end-to-end to demonstrate how our approach minimizes the necessary interactions. We publish our code as open source.
引用
收藏
页数:7
相关论文
共 16 条
  • [1] Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes
    Arora, Simran
    Yang, Brandon
    Eyuboglu, Sabri
    Narayan, Avanika
    Hojel, Andrew
    Trummer, Immanuel
    Re, Christopher
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 17 (02): : 92 - 105
  • [2] Douze M, 2024, Arxiv, DOI arXiv:2401.08281
  • [3] Elsahar H, 2018, PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), P3448
  • [4] Fang Z, 2021, 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), P198
  • [5] Ferreira M, 2020, Arxiv, DOI arXiv:2012.14235
  • [6] Demonstrating ASET: Ad-hoc Structured Exploration of Text Collections
    Haettasch, Benjamin
    Bodensohn, Jan-Micha
    Binnig, Carsten
    [J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 2393 - 2396
  • [7] Hattasch Benjamin, 2023, LNI, VP- 331, DOI [10.18420/BTW2023-08, DOI 10.18420/BTW2023-08]
  • [8] Hattasch Benjamin, 2021, CEUR Work- 2950, P179
  • [9] A Survey on Deep Learning for Named Entity Recognition
    Li, Jing
    Sun, Aixin
    Han, Jianglei
    Li, Chenliang
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (01) : 50 - 70
  • [10] Ling Xiao, 2012, P 26 AAAI C ART INT