Labeling Blog Posts with Wikipedia Entries through LDA-Based Topic Modeling of Wikipedia

被引:8
|
作者
Makita, Kensaku [1 ]
Suzuki, Hiroko
Koike, Daichi [1 ]
Utsuro, Takehito [2 ]
Kawada, Yasuhide
Fukuhara, Tomohiro [3 ]
机构
[1] Univ Tsukuba, Grad Sch Syst & Informat Engn, Dept Intelligent Interact Technol, Tsukuba, Ibaraki 305, Japan
[2] Univ Tsukuba, Fac Engn Informat & Syst, Div Intelligent Interact Technol, Tsukuba, Ibaraki 305, Japan
[3] Natl Inst Adv Ind Sci & Technol, Ctr Serv Res, Tokyo, Japan
来源
JOURNAL OF INTERNET TECHNOLOGY | 2013年 / 14卷 / 02期
关键词
Blog; Wikipedia; Topic model; LDA; Topic analysis;
D O I
10.6138/JIT.2013.14.2.13
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Given a search query, most existing search engines simply return a ranked list of search results. However, it is often the case that those search result documents consist of a mixture of documents that are closely related to various contents. In order to address the issue of quickly overviewing the distribution of contents, this paper proposes a framework of labeling blog posts with Wikipedia entries through LDA (latent Dirichlet allocation) based topic modeling of Wikipedia. One of the most important advantages of this LDA-based document model is that the collected Wikipedia entries and their LDA parameters heavily depend on the distribution of keywords across all the search result of blog posts. This tendency actually contributes to quickly overviewing the search result of blog posts through the LDA-based topic distribution. We show that the LDA-based document retrieval scheme outperforms our previous approach. Finally, we compare the proposed approach to the standard LDA-based topic modeling without Wikipedia knowledge source. Both LDA-based topic modeling results have quite different nature and contribute to quickly overviewing the search result of blog posts in a quite complementary fashion.
引用
收藏
页码:297 / 306
页数:10
相关论文
共 28 条
  • [1] Labeling News Topic Threads with Wikipedia Entries
    Okuoka, Tomoki
    Takahashi, Tomokazu
    Deguchi, Daisuke
    Ide, Ichiro
    Murase, Hiroshi
    2009 11TH IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2009), 2009, : 501 - +
  • [2] Japanese/english blog distillation and cross-lingual blog analysis with multilingual wikipedia entries as fundamental knowledge source
    Nakasaki H.
    Kawaba M.
    Yokomoto D.
    Utsuro T.
    Fukuhara T.
    Transactions of the Japanese Society for Artificial Intelligence, 2010, 25 (05) : 613 - 622
  • [3] Topic Modeling for Wikipedia Link Disambiguation
    Skaggs, Bradley
    Getoor, Lise
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2014, 32 (03)
  • [4] Wikipedia Based News Video Topic Modeling for Information Extraction
    Roy, Sujoy
    Mak, Mun-Thye
    Wan, Kong Wah
    ADVANCES IN MULTIMEDIA MODELING, PT II, 2011, 6524 : 411 - 420
  • [5] TopicPie: An Interactive Visualization for LDA-based Topic Analysis
    Yang, Yi
    Wang, Jian
    Huang, Weixing
    Zhang, Guigang
    2016 IEEE SECOND INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2016, : 25 - 28
  • [6] Phrase ranking and Wikipedia based Cluster Labeling
    Chinthala, Pradyumna Reddy
    2013 INTERNATIONAL CONFERENCE ON MACHINE INTELLIGENCE AND RESEARCH ADVANCEMENT (ICMIRA 2013), 2013, : 199 - 202
  • [7] Evaluating the Performance of Topic Modeling Techniques for Bibliometric Analysis Research: An LDA-based Approach
    Nguyen L.T.
    Chansanam W.
    Hunsapun N.
    Chaichuay V.
    Kanyacome S.
    Takhom A.
    Jaroenruen Y.
    Li C.
    HighTech and Innovation Journal, 2024, 5 (02): : 312 - 330
  • [8] WIKIPEDIA-BASED KERNELS FOR DIALOGUE TOPIC TRACKING
    Kim, Seokhwan
    Banchs, Rafael E.
    Li, Haizhou
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [9] LDA-based online topic detection using tensor factorization
    Guo, Xin
    Xiang, Yang
    Chen, Qian
    Huang, Zhenhua
    Hao, Yongtao
    JOURNAL OF INFORMATION SCIENCE, 2013, 39 (04) : 459 - 469
  • [10] LDA-Based Unified Topic Modeling for Similar TV User Grouping and TV Program Recommendation
    Pyo, Shinjee
    Kim, Eunhui
    Kim, Munchurl
    IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (08) : 1476 - 1490