Labeling Blog Posts with Wikipedia Entries through LDA-Based Topic Modeling of Wikipedia

被引:8
|
作者
Makita, Kensaku [1 ]
Suzuki, Hiroko
Koike, Daichi [1 ]
Utsuro, Takehito [2 ]
Kawada, Yasuhide
Fukuhara, Tomohiro [3 ]
机构
[1] Univ Tsukuba, Grad Sch Syst & Informat Engn, Dept Intelligent Interact Technol, Tsukuba, Ibaraki 305, Japan
[2] Univ Tsukuba, Fac Engn Informat & Syst, Div Intelligent Interact Technol, Tsukuba, Ibaraki 305, Japan
[3] Natl Inst Adv Ind Sci & Technol, Ctr Serv Res, Tokyo, Japan
来源
JOURNAL OF INTERNET TECHNOLOGY | 2013年 / 14卷 / 02期
关键词
Blog; Wikipedia; Topic model; LDA; Topic analysis;
D O I
10.6138/JIT.2013.14.2.13
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Given a search query, most existing search engines simply return a ranked list of search results. However, it is often the case that those search result documents consist of a mixture of documents that are closely related to various contents. In order to address the issue of quickly overviewing the distribution of contents, this paper proposes a framework of labeling blog posts with Wikipedia entries through LDA (latent Dirichlet allocation) based topic modeling of Wikipedia. One of the most important advantages of this LDA-based document model is that the collected Wikipedia entries and their LDA parameters heavily depend on the distribution of keywords across all the search result of blog posts. This tendency actually contributes to quickly overviewing the search result of blog posts through the LDA-based topic distribution. We show that the LDA-based document retrieval scheme outperforms our previous approach. Finally, we compare the proposed approach to the standard LDA-based topic modeling without Wikipedia knowledge source. Both LDA-based topic modeling results have quite different nature and contribute to quickly overviewing the search result of blog posts in a quite complementary fashion.
引用
收藏
页码:297 / 306
页数:10
相关论文
共 28 条
  • [21] Modeling on Micro-blog Topic Detection Based on Semantic Dependency
    Ruan, Dong-ru
    Han, Jia
    Dang, Ying
    Zhang, Shan-shan
    Gao, Kai
    2017 9TH INTERNATIONAL CONFERENCE ON MODELLING, IDENTIFICATION AND CONTROL (ICMIC 2017), 2017, : 839 - 844
  • [22] LDA-Based Topic Mining for Unveiling the Outstanding Universal Value of Solo Keroncong Music as an Intangible Cultural Heritage of UNESCO
    Witarti, Denik Iswardani
    Sugiyanto, Danis
    Ariesta, Atik
    Ariyani, Pipin Farida
    Rusdah
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2025, 16 (02) : 1001 - 1010
  • [23] Sparse Representation Based Query Classification Using LDA Topic Modeling
    Bhattacharya, Indrani
    Sil, Jaya
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DATA ENGINEERING AND COMMUNICATION TECHNOLOGY, ICDECT 2016, VOL 2, 2017, 469 : 621 - 629
  • [24] Bangla News Trend Observation using LDA Based Topic Modeling
    Alam, Kazi Masudul
    Hemel, Md Tanvir Hussain
    Islam, S. M. Muhaiminul
    Akther, Avsha
    2020 23RD INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT 2020), 2020,
  • [25] Clustering of Business Organisations based on Textual Data - An LDA Topic Modeling Approach
    Tolner, Ferenc
    Takacs, Marta
    Eigner, Gyorgy
    Barta, Balazs
    21ST IEEE INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND INFORMATICS (CINTI), 2021, : 79 - 84
  • [26] An Improved LDA Topic Modeling Method Based on Partition for Medium and Long Texts
    Guo C.
    Lu M.
    Wei W.
    Annals of Data Science, 2021, 8 (02) : 331 - 344
  • [27] Non-stochastic quadratic fingerprints and LDA-based QSAR models in hit and lead generation through virtual screening:: theoretical and experimental assessment of a promising method for the discovery of new antimalarial compounds
    Montero-Torres, Alina
    Garcia-Sanchez, Rory N.
    Marrero-Ponce, Yovani
    Machado-Tugores, Yanetsy
    Nogal-Ruiz, Juan J.
    Martinez-Fernandez, Antonio R.
    Aran, Vicente J.
    Ochoa, Carmen
    Meneses-Marcel, Alfredo
    Torrens, Francisco
    EUROPEAN JOURNAL OF MEDICINAL CHEMISTRY, 2006, 41 (04) : 483 - 493
  • [28] Analyzing genderless fashion trends of consumers' perceptions on social media: using unstructured big data analysis through Latent Dirichlet Allocation-based topic modeling
    Kim, Hyojung
    Cho, Inho
    Park, Minjung
    FASHION AND TEXTILES, 2022, 9 (01)