PREFCA: A portal retrieval engine based on formal concept analysis

被引:15
|
作者
Negm, Eman [1 ]
AbdelRahman, Samir [1 ]
Bahgat, Reem [1 ]
机构
[1] Cairo Univ, Fac Comp & Informat, Dept Comp Sci, Giza, Egypt
关键词
Information retrieval; Formal concept analysis; Network analysis; Portal retrieval; INFORMATION-RETRIEVAL; CONCEPT LATTICES; TEXT RETRIEVAL; WEB; ALGORITHMS;
D O I
10.1016/j.ipm.2016.08.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The web is a network of linked sites whereby each site either forms a physical portal or a standalone page. In the former case, the portal presents an access point to its embedded web pages that coherently present a specific topic. In the latter case, there are millions of standalone web pages, that are scattered throughout the web, having the same topic and could be conceptually linked together to form virtual portals. Search engines have been developed to help users in reaching the adequate pages in an efficient and effective manner. All the known current search engine techniques rely on the web page as the basic atomic search unit. They ignore the conceptual links, that reveal the implicit web related meanings, among the retrieved pages. However, building a semantic model for the whole portal may contain more semantic information than a model of scattered individual pages. In addition, user queries can be poor and contain imprecise terms that do not reflect the real user intention. Consequently, retrieving the standalone individual pages that are directly related to the query may not satisfy the user's need. In this paper, we propose PREFCA, a Portal Retrieval Engine based on Formal Concept Analysis that relies on the portal as the main search unit. PREFCA consists of three phases: First, the information extraction phase that is concerned with extracting portal's semantic data. Second, the formal concept analysis phase that utilizes formal concept analysis to discover the conceptual links among portal and attributes. Finally, the information retrieval phase where we propose a portal ranking method to retrieve ranked pairs of portals and embedded pages. Additionally, we apply the network analysis rules to output some portal characteristics. We evaluated PREFCA using two data sets, namely the Forum for Information Retrieval Evaluation 2010 and ClueWeb09 (category B) test data, for physical and virtual portals respectively. PREFCA proves higher F-measure accuracy, better Mean Average Precision ranking and comparable network analysis and efficiency results than other search engine approaches, namely Term Frequency Inverse Document Frequency (TF-IDF), Latent Semantic Analysis (LSA), and BM25 techniques. As well, it gains high Mean Average Precision in comparison with learning to rank techniques. Moreover, PREFCA also gains better reach time than Carrot as a well-known topic-based search engine. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:203 / 222
页数:20
相关论文
共 50 条
  • [21] Billingual Formal Concept Analysis for Cross-Language Information Retrieval
    Ali, Chedi Bechikh
    Haddad, Hatem
    Slimani, Yahia
    2017 IEEE/ACS 14TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2017, : 922 - 928
  • [22] Formal concept analysis based on the topology for attributes of a formal context
    Pei, Zheng
    Ruan, Da
    Meng, Dan
    Liu, Zhicai
    INFORMATION SCIENCES, 2013, 236 : 66 - 82
  • [23] Concept reduction in formal concept analysis based on representative concept matrix
    Zhao, Siyu
    Qi, Jianjun
    Li, Junan
    Wei, Ling
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (04) : 1147 - 1160
  • [24] Concept reduction in formal concept analysis based on representative concept matrix
    Siyu Zhao
    Jianjun Qi
    Junan Li
    Ling Wei
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 1147 - 1160
  • [25] Ontology-based concept similarity in Formal Concept Analysis
    Formica, Anna
    INFORMATION SCIENCES, 2006, 176 (18) : 2624 - 2641
  • [27] Formal concept analysis based on hierarchical class analysis
    Chen, YH
    Yao, YY
    ICCI 2005: FOURTH IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS - PROCEEDINGS, 2005, : 285 - 292
  • [28] Ranking Ontologies Based on Formal Concept Analysis
    Li, Jianghua
    Shi, Peng
    Cheng, Mingzhi
    JOURNAL OF COMPUTERS, 2014, 9 (01) : 215 - 221
  • [29] Concept analysis based on granular formal contexts
    Wang, Zhen
    Wei, Ling
    Qi, Jianjun
    2018 18TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2018, : 480 - 486
  • [30] Granule description based on formal concept analysis
    Zhi, Huilai
    Li, Jinhai
    KNOWLEDGE-BASED SYSTEMS, 2016, 104 : 62 - 73