PREFCA: A portal retrieval engine based on formal concept analysis

被引:15
|
作者
Negm, Eman [1 ]
AbdelRahman, Samir [1 ]
Bahgat, Reem [1 ]
机构
[1] Cairo Univ, Fac Comp & Informat, Dept Comp Sci, Giza, Egypt
关键词
Information retrieval; Formal concept analysis; Network analysis; Portal retrieval; INFORMATION-RETRIEVAL; CONCEPT LATTICES; TEXT RETRIEVAL; WEB; ALGORITHMS;
D O I
10.1016/j.ipm.2016.08.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The web is a network of linked sites whereby each site either forms a physical portal or a standalone page. In the former case, the portal presents an access point to its embedded web pages that coherently present a specific topic. In the latter case, there are millions of standalone web pages, that are scattered throughout the web, having the same topic and could be conceptually linked together to form virtual portals. Search engines have been developed to help users in reaching the adequate pages in an efficient and effective manner. All the known current search engine techniques rely on the web page as the basic atomic search unit. They ignore the conceptual links, that reveal the implicit web related meanings, among the retrieved pages. However, building a semantic model for the whole portal may contain more semantic information than a model of scattered individual pages. In addition, user queries can be poor and contain imprecise terms that do not reflect the real user intention. Consequently, retrieving the standalone individual pages that are directly related to the query may not satisfy the user's need. In this paper, we propose PREFCA, a Portal Retrieval Engine based on Formal Concept Analysis that relies on the portal as the main search unit. PREFCA consists of three phases: First, the information extraction phase that is concerned with extracting portal's semantic data. Second, the formal concept analysis phase that utilizes formal concept analysis to discover the conceptual links among portal and attributes. Finally, the information retrieval phase where we propose a portal ranking method to retrieve ranked pairs of portals and embedded pages. Additionally, we apply the network analysis rules to output some portal characteristics. We evaluated PREFCA using two data sets, namely the Forum for Information Retrieval Evaluation 2010 and ClueWeb09 (category B) test data, for physical and virtual portals respectively. PREFCA proves higher F-measure accuracy, better Mean Average Precision ranking and comparable network analysis and efficiency results than other search engine approaches, namely Term Frequency Inverse Document Frequency (TF-IDF), Latent Semantic Analysis (LSA), and BM25 techniques. As well, it gains high Mean Average Precision in comparison with learning to rank techniques. Moreover, PREFCA also gains better reach time than Carrot as a well-known topic-based search engine. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:203 / 222
页数:20
相关论文
共 50 条
  • [31] Information Retrieval from Software Bug Ontology Exploiting Formal Concept Analysis
    Jindal, Shubhra Goyal
    Kaur, Arvinder
    COMPUTACION Y SISTEMAS, 2020, 24 (02): : 413 - 428
  • [32] Introduction to Formal Concept Analysis and Its Applications in Information Retrieval and Related Fields
    Ignatov, Dmitry I.
    INFORMATION RETRIEVAL, RUSSIR 2014, 2015, 505 : 42 - 141
  • [33] Interval-Valued Fuzzy Extension of Formal Concept Analysis for Information Retrieval
    Zerarga, Loutfi
    Djouadi, Yassine
    NEURAL INFORMATION PROCESSING, ICONIP 2012, PT I, 2012, 7663 : 608 - 615
  • [34] Analysis of a Vector Space Model, Latent Semantic Indexing and Formal Concept Analysis for Information Retrieval
    Kumar, Ch Aswani
    Radvansky, M.
    Annapurna, J.
    CYBERNETICS AND INFORMATION TECHNOLOGIES, 2012, 12 (01) : 34 - 48
  • [35] Formal Concept Analysis for Concept Collecting and Their Analysis
    Jurkevicius, Darius
    Vasilecas, Olegas
    BALTIC JOURNAL OF MODERN COMPUTING, 2009, 751 : 22 - 39
  • [36] A novel conflict analysis model based on the formal concept analysis
    Wang, Lu
    Pei, Zheng
    Qin, Keyun
    APPLIED INTELLIGENCE, 2023, 53 (09) : 10699 - 10714
  • [37] The Criteria of Ontology Quality Analysis Based on Formal Concept Analysis
    Merdygeev, Bato
    Dambaeva, Sesegma
    PROCEEDINGS OF THE 2018 3RD RUSSIAN-PACIFIC CONFERENCE ON COMPUTER TECHNOLOGY AND APPLICATIONS (RPC), 2018,
  • [38] A novel conflict analysis model based on the formal concept analysis
    Lu Wang
    Zheng Pei
    Keyun Qin
    Applied Intelligence, 2023, 53 : 10699 - 10714
  • [39] Enhancing search engine quality using concept-based text retrieval
    Shehata, Shady
    Karray, Fakhri
    Kamel, Mohamed
    PROCEEDINGS OF THE IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE: WI 2007, 2007, : 26 - 32
  • [40] Product Variety Modeling Based on Formal Concept Analysis
    Kim, Taioun
    INDUSTRIAL ENGINEERING AND MANAGEMENT SYSTEMS, 2010, 9 (01): : 1 - 9