Active Learning of Predefined Models for Information Extraction: Selecting Regular Expressions from Examples

被引:1
作者
Bartoli, Alberto [1 ]
De Lorenzo, Andrea [1 ]
Medvet, Eric [1 ]
Tarlao, Fabiano [1 ,2 ]
机构
[1] Univ Trieste, Dept Engn & Architecture, Trieste, Italy
[2] Univ Trieste, Dept Math & Geosci, Trieste, Italy
来源
FUZZY SYSTEMS AND DATA MINING V (FSDM 2019) | 2019年 / 320卷
关键词
Active learning; information extraction; regex; regular expressions;
D O I
10.3233/FAIA190232
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem of constructing a regular expression for information extraction automatically, based only on examples of the desired extraction behavior. We describe an active learning framework that is not aimed at synthesizing a solution from scratch, but rather is aimed at selecting a solution from a set of more than 3000 solutions that have already proven useful in a broad range of practical applications. The user provides only one example of desired extraction and then interactively annotates text snippets selected by the system. The system constructs such queries based on uncertainty sampling, i.e., by selecting the snippet on which it is most uncertain at each learning step. The resulting framework allows solving many practical extraction problems quickly and simply.
引用
收藏
页码:645 / 651
页数:7
相关论文
共 13 条
  • [1] Angluin D., 1988, Machine Learning, V2, P319, DOI 10.1023/A:1022821128753
  • [2] [Anonymous], Active learning literature survey
  • [3] Active Learning of Regular Expressions for Entity Extraction
    Bartoli, Alberto
    De Lorenzo, Andrea
    Medvet, Eric
    Tarlao, Fabiano
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2018, 48 (03) : 1067 - 1080
  • [4] Can a Machine Replace Humans in Building Regular Expressions? A Case Study
    Bartoli, Alberto
    De Lorenzo, Andrea
    Medvet, Eric
    Tarlao, Fabiano
    [J]. IEEE INTELLIGENT SYSTEMS, 2016, 31 (06) : 15 - 21
  • [5] Inference of Regular Expressions for Text Extraction from Examples
    Bartoli, Alberto
    De Lorenzo, Andrea
    Medvet, Eric
    Tarlao, Fabiano
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (05) : 1217 - 1230
  • [6] Active learning for ranking with sample density
    Cai, Wenbin
    Zhang, Muhan
    Zhang, Ya
    [J]. INFORMATION RETRIEVAL JOURNAL, 2015, 18 (02): : 123 - 144
  • [7] Cheng Y, 2013, PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), P1311
  • [8] Lewis D. D., 1994, SIGIR '94. Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, P3
  • [9] Olsson F., 2009, SICS Technical Report
  • [10] Scheffer T., 2001, Advances in Intelligent Data Analysis. 4th International Conference, IDA 2001. Proceedings (Lecture Notes in Computer Science Vol.2189), P309