An effective LDA-based time topic model to improve blog search performance

被引:27
作者
Chen, Lin-Chih [1 ]
机构
[1] Natl Dong Hwa Univ, Dept Informat Management, 1,Sec 2,Hsueh Rd, Hualien 97401, Taiwan
关键词
Blog search; Blog post; Time relationship; Natural language processing; Semantic analysis model; SIMILARITY; TEXT; CONTEXT;
D O I
10.1016/j.ipm.2017.08.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Blog search engines and general search engines automatically crawl web pages from the Internet and produce search results for users. One difference between the two is that blog search engines focus on posts and ignore the rest of the pages. Obviously, the pages indexed by the general search engine are always greater than the posts. This feature allows bloggers to focus only on the posts they are interested in, rather than other types of pages. The other difference is that posts involve more time-related issues compared to general pages. For the general pages, the general search engine often can only show the last update time. However, for the post, the blog search engine can display various possible times. For some often updated posts, the time factor can help bloggers find information more efficiently. In this paper, we first use some well-known semantic analysis models to analyze the performance of the blog search. Next, we consider the time relationship between posts to further improve its performance. Finally, we provide some experiments to simulate various possible scenarios to confirm the effectiveness of this relationship. The contributions of this paper are twofold. One is that we build a high-performance system that considers the importance of blog topics at different times. The other is that we consider the time relationship between posts, which can rank the relevant blog topics based on the popularity of the posts. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1299 / 1319
页数:21
相关论文
共 73 条
  • [1] [Anonymous], 2010, P 23 INT FLOR ART IN
  • [2] [Anonymous], P WORKSH WEBL EC WWW
  • [3] [Anonymous], [No title captured]
  • [4] [Anonymous], 2009, P 3 ACM C RECOMMENDE, DOI DOI 10.1145/1639714.1639726
  • [5] [Anonymous], 1997, P 10 RES COMPUTATION
  • [6] [Anonymous], P 15 IN STUD C
  • [7] [Anonymous], 2007, HDB LATENT SEMANTIC, DOI DOI 10.4324/9780203936399
  • [8] [Anonymous], LNCS
  • [9] [Anonymous], [No title captured]
  • [10] Batra S., 2010, International Journal of Computer Theory and Engineering, V2, P139