Unsupervised incremental acquisition of a thematic corpus from the Web

被引:0
|
作者
Duclaye, F [1 ]
Yvon, F [1 ]
Collin, O [1 ]
机构
[1] France Telecom, R&D, F-22307 Lannion, France
来源
2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS | 2003年
关键词
paraphrases; synonyms; machine learning; Web; automatic classification; EM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a nearly unsupervised learning methodology for automatically acquiring a thematic corpus from the Web. Relying on a bootstrapping mechanism, our system starts with one single linguistic expression of a given target, semantic relationship. It then samples the Web so as to progressively accumulate a corpus of potential examples of the same relationship. Sampling steps alternate with filtering steps, making it possible to keep the corpus thematically focused. The corpus is finally analysed to search for potential paraphrases of the initial expression of the semantic relationship. These paraphrases will eventually be used to improve our question-answering system. This paper focuses on the learning aspect of the system and reports experimental results regarding the effectiveness of our filtering strategy.
引用
收藏
页码:752 / 757
页数:6
相关论文
共 50 条
  • [1] Acquisition of morphology of an indic language from text corpus
    Sharma, Utpal
    Kalita, Jugal K.
    Das, Rajib K.
    ACM Transactions on Asian Language Information Processing, 2008, 7 (03):
  • [2] An unsupervised approach for learning a Chinese IS-A taxonomy from an unstructured corpus
    Huang, Subin
    Luo, Xiangfeng
    Huang, Jing
    Guo, Yike
    Gu, Shengwei
    KNOWLEDGE-BASED SYSTEMS, 2019, 182
  • [3] The Web as Corpus in Translation
    Song, Li-jue
    INTERNATIONAL CONFERENCE ON MODERN EDUCATION AND INFORMATION TECHNOLOGY (MEIT 2017), 2017, : 238 - 242
  • [4] Towards Large-Scale Unsupervised Relation Extraction from the Web
    Min, Bonan
    Shi, Shuming
    Grishman, Ralph
    Lin, Chin-Yew
    INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2012, 8 (03) : 1 - 23
  • [5] Unsupervised Vehicle Recognition Using Incremental Reseeding of Acoustic Signatures
    Sunu, Justin
    Percus, Allon G.
    Hunter, Blake
    FOUNDATIONS OF INTELLIGENT SYSTEMS (ISMIS 2018), 2018, 11177 : 151 - 160
  • [6] Incremental Unsupervised Domain-Adversarial Training of Neural Networks
    Gallego, Antonio-Javier
    Calvo-Zaragoza, Jorge
    Fisher, Robert B.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (11) : 4864 - 4878
  • [7] A High-Quality Web Corpus of Czech
    Spoustova, Johanka
    Spousta, Miroslav
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 311 - 315
  • [8] Mining nearness relations from an n-grams Web corpus in geographical space
    Derungs, Curdin
    Purves, Ross S.
    SPATIAL COGNITION AND COMPUTATION, 2016, 16 (04) : 301 - 322
  • [9] Web Accessibility in Advanced Technologies Introduction to the Special Thematic Session
    Abou-Zahra, Shadi
    Votis, Konstantinos
    Van Isacker, Karel
    COMPUTERS HELPING PEOPLE WITH SPECIAL NEEDS, PT I, 2012, 7382 : 323 - +
  • [10] Nested Dolls: Towards Unsupervised Clustering of Web Tables
    Khan, Rituparna
    Gubanov, Michael
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 5357 - 5359