Unsupervised incremental acquisition of a thematic corpus from the Web

被引:0
|
作者
Duclaye, F [1 ]
Yvon, F [1 ]
Collin, O [1 ]
机构
[1] France Telecom, R&D, F-22307 Lannion, France
来源
2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS | 2003年
关键词
paraphrases; synonyms; machine learning; Web; automatic classification; EM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a nearly unsupervised learning methodology for automatically acquiring a thematic corpus from the Web. Relying on a bootstrapping mechanism, our system starts with one single linguistic expression of a given target, semantic relationship. It then samples the Web so as to progressively accumulate a corpus of potential examples of the same relationship. Sampling steps alternate with filtering steps, making it possible to keep the corpus thematically focused. The corpus is finally analysed to search for potential paraphrases of the initial expression of the semantic relationship. These paraphrases will eventually be used to improve our question-answering system. This paper focuses on the learning aspect of the system and reports experimental results regarding the effectiveness of our filtering strategy.
引用
收藏
页码:752 / 757
页数:6
相关论文
共 50 条
  • [41] Unsupervised machine learning, QSAR modelling and web tool development for streamlining the lead identification process of antimalarial flavonoids
    Zothantluanga, J. H.
    Chetia, D.
    Rajkhowa, S.
    Umar, A. K.
    SAR AND QSAR IN ENVIRONMENTAL RESEARCH, 2023, 34 (02) : 117 - 146
  • [42] THE WEB 2.0 AS A PLATFORM FOR THE ACQUISITION OF SKILLS, IMPROVE ACADEMIC PERFORMANCE AND DESIGNER CAREER PROMOTION IN THE UNIVERSITY
    Felip Miralles, F.
    Martin Martin, S.
    Garcia Martinez, Ma L.
    Navarro Lizandra, J. L.
    7TH INTERNATIONAL TECHNOLOGY, EDUCATION AND DEVELOPMENT CONFERENCE (INTED2013), 2013, : 4185 - 4194
  • [43] DIGITAL BRANDS AND WEB 3.0 ENTERPRISES: SOCIAL NETWORK ANALYSIS AND THEMATIC ANALYSIS OF USER ACTIVITIES AND BEHAVIORAL PATTERNS IN ONLINE RETAILERS
    Farzad, Fatemeh Sharafi
    Soltani, Tohid
    Kolli, Shaghayegh
    Ghanbary, Saeid
    AD-MINISTER, 2019, (34) : 111 - 130
  • [44] From Supervised to Unsupervised Support Vector Machines and Applications in Astronomy
    Gieseke, Fabian
    KUNSTLICHE INTELLIGENZ, 2013, 27 (03): : 281 - 285
  • [45] Unsupervised extraction of local and global keywords from a single text
    Aleksanyan, Lida
    Allahverdyan, Armen
    NATURAL LANGUAGE PROCESSING, 2024,
  • [46] Characterizing Seismic Activity From a Rock Cliff With Unsupervised Learning
    Morin, Alexi
    Giroux, Bernard
    Gauthier, Francis
    JOURNAL OF GEOPHYSICAL RESEARCH-EARTH SURFACE, 2024, 129 (09)
  • [47] A full stack data acquisition, archive and access solution for J-TEXT based on web technologies
    Zheng, Wei
    Wang, Yuxing
    Zhang, Ming
    Wu, Feiyang
    Yang, Zhou
    FUSION ENGINEERING AND DESIGN, 2020, 155 (155)
  • [48] An Intelligent Composition Algorithm for Automatic Thematic Music Generation from Extant Pieces
    Suprem, Abhijit
    Ruprem, Manjit
    TRANSACTIONS ON ENGINEERING TECHNOLOGIES: SPECIAL ISSUE OF THE WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE 2013, 2014, : 261 - 274
  • [49] Inferring Paraphrases for a Highly Inflected Language from a Monolingual Corpus
    Bar, Kfir
    Dershowitz, Nachum
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2014, PART II, 2014, 8404 : 254 - 270
  • [50] From Web 2.0 to Web 3.0: Antecedents and consequences of the attitude and use intention of social networking in the semantic Web
    Kuster, Ines
    Hernandez, Asuncion
    UNIVERSIA BUSINESS REVIEW, 2013, (37): : 104 - 119