An Indexing Network: Model and Applications

被引:10
作者
Jiang, Changjun [1 ]
Sun, Haichun [1 ]
Ding, Zhijun [1 ]
Wang, Pengwei [1 ]
Zhou, MengChu [2 ,3 ]
机构
[1] Tongji Univ, Minist Educ, Key Lab Embedded Syst & Serv Comp, Shanghai 201804, Peoples R China
[2] New Jersey Inst Technol, Dept Elect & Comp Engn, Newark, NJ 07102 USA
[3] Tongji Univ, Sch Elect & Informat Engn, Shanghai 201804, Peoples R China
来源
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS | 2014年 / 44卷 / 12期
基金
中国国家自然科学基金;
关键词
Exploratory search; hyperlink; indexing network; webpage application; webpage management;
D O I
10.1109/TSMC.2014.2320695
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Internet data are heterogeneous, redundant, disordered, and exponentially growing. Finding the right information from them becomes an ever-challenging issue. Existing technologies such as inverted index and keyword matching can list user webpage matching with given search keywords. They cannot recognize potential relations among webpages to meet some rising user needs, e.g., exploratory search and personalized search. We propose an indexing network model that organizes information in webpages at three levels: words, webpages, and categories, thereby leading to a semantic association graph. Words are used as the description of webpages and categories. Webpage classification is used to gather similar webpages together. Hyperlinks imply the wisdom of the webpage creator, which can help us generate semantic relations among categories. With a clear organizational structure, an indexing network can provide support for many important applications including intelligent information retrieval, recommendation and decision support. In order to provide access to interfaces for the proposed indexing network, an indexing network algebra is defined. Finally, to validate the proposed model, an indexing network is generated based on 30 million webpages and its structure is analyzed. We also give methods to achieve "browsing navigation" and "personalized search" based on the generated network. Results reveal that the use of an indexing network can greatly facilitate exploratory information retrieval and personalized search.
引用
收藏
页码:1633 / 1648
页数:16
相关论文
共 45 条
[1]  
[Anonymous], 2002, P 8 ACM SIGKDD INT C
[2]  
[Anonymous], 2002, P ACM SIGKDD KDD 200, DOI 10.1145/775047.775067
[3]  
[Anonymous], 2002, Proceedings of the 11th international conference on World Wide Web, DOI DOI 10.1145/511446.511513
[4]  
[Anonymous], 1998, LEARNING TEXT CATEGO
[5]  
[Anonymous], 2008, CIKM
[6]  
[Anonymous], 2008, Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'08
[7]  
Baeza-Yates R, 2007, LECT NOTES COMPUT SC, V4362, P1
[8]  
Baeza-Yates R, 2007, KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P76
[9]  
Bakhshandeh R., 2012, 2012 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP), P283, DOI 10.1109/AISP.2012.6313759
[10]   A Comprehensive Study of Features and Algorithms for URL-Based Topic Classification [J].
Baykan, Eda ;
Henzinger, Monika ;
Marian, Ludmila ;
Weber, Ingmar .
ACM TRANSACTIONS ON THE WEB, 2011, 5 (03)