Extracting Threshold Conceptual Structures from Web Documents

被引:3
作者
Ciobanu, Gabriel [1 ]
Horne, Ross [1 ]
Vaideanu, Cristian [2 ]
机构
[1] Romanian Acad, Inst Comp Sci, Iasi, Romania
[2] AI Cuza Univ Ia, Fac Math, Iasi, Romania
来源
GRAPH-BASED REPRESENTATION AND REASONING | 2014年 / 8577卷
关键词
D O I
10.1007/978-3-319-08389-6_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we describe an iterative approach based on formal concept analysis to refine the information retrieval process. Based on weights for ranking documents we define a weighted formal context. We use a Galois connection to introduce a new type of formal concept that allows us to work with specific thresholds for searching words in Web documents. By increasing the threshold, we obtain smaller lattices with more relevant concepts, thus improving the retrieval of more specific items. We use techniques for processing large data sets in parallel, to generate sequences of Galois lattices, overcoming the time complexity of building a lattice for an entire large context.
引用
收藏
页码:130 / 144
页数:15
相关论文
共 50 条
  • [21] Extracting domain-specific terms from unlabeled web documents by bootstrapping and term classifiers
    Liu, Tao
    Wang, Xiao-Long
    Liu, Bing-Quan
    Liu, Yuan-Chao
    Li, Ming-Hui
    2007 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-8, 2007, : 1536 - 1541
  • [22] Fixing the Threshold for Effective Detection of Near Duplicate Web Documents in Web Crawling
    Narayana, V. A.
    Premchand, P.
    Govardhan, A.
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2010, PT I, 2010, 6440 : 169 - 180
  • [23] EXTRACTING THE MAIN CONTENT OF WEB DOCUMENTS BASED ON A NAIVE SMOOTHING METHOD
    Mohammadzadeh, Hadi
    Gottron, Thomas
    Schweiggert, Franz
    Nakhaeizadeh, Gholamreza
    KDIR 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND INFORMATION RETRIEVAL, 2011, : 470 - 475
  • [24] Extracting semantic relationships between terms from PC documents and its applications to web search personalization
    Ohshima, H
    Oyama, S
    Tanaka, K
    FRONTIERS OF WWW RESEARCH AND DEVELOPMENT - APWEB 2006, PROCEEDINGS, 2006, 3841 : 579 - 590
  • [25] A Study of Extracting Knowledge from Guideline Documents
    Taboada, M.
    Meizoso, M.
    Martinez, D.
    Tellado, S.
    COMPUTER AIDED SYSTEMS THEORY - EUROCAST 2009, 2009, 5717 : 195 - +
  • [26] Extracting Topical Phrases from Clinical Documents
    He, Yulan
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2957 - 2963
  • [27] Extracting mathematical expressions from postscript documents
    Department of Precision Machinery and Precision Instrumentation, University of Science and Technology of China, Hefei 230027, China
    不详
    Shu Ju Cai Ji Yu Chu Li, 2008, 4 (454-458):
  • [28] Extracting Time Information from Korean Documents
    Lee, Seung-Dong
    Jeong, Young-Seob
    2023 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, BIGCOMP, 2023, : 407 - 409
  • [29] Extracting mathematical semantics from LATEX documents
    Stuber, J
    van den Brand, M
    PRINCIPLES AND PRACTICE OF SEMANTIC WEB REASONING, 2003, 2901 : 160 - 173
  • [30] Extracting digital fingerprints from Chinese documents
    Liu, Guo-Hua
    Ma, Hui-Dong
    Li, Xu
    Liang, Peng
    CIS: 2007 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY, PROCEEDINGS, 2007, : 438 - 441