Dimensionality reduction framework for blog mining and visualisation

被引:2
作者
Tsai, Flora S. [1 ]
机构
[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore
关键词
blog mining; dimensionality reduction; visualisation; multidimensional scaling; MDS; isometric feature mapping; Isomap; locally linear embedding; LLE; latent Dirichlet allocation; LDA;
D O I
10.1504/IJDMMM.2012.048108
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The growing abundance of blogs and new forms of social media has created a critical need for new technologies to transfer the digital realm of social media into a manageable form. Blog mining addresses the domain-specific problem of mining information from blog data. Although mining blogs may share many similarities to web and text documents, existing data mining techniques need to be reevaluated and adapted for the multidimensional representation of blog data, which exhibit dimensions not present in traditional documents. In this paper, a new approach is presented for blog mining and visualisation based on dimensionality reduction techniques. The author-topic model based on latent Dirichlet allocation was extended for analysing and visualising blog authors, links, and time. A framework based on dimensionality reduction is proposed to visualise the blog dimensions of content, tags, authors, links, and time. This framework has been successfully designed, implemented, and evaluated on real-world blog data.
引用
收藏
页码:267 / 285
页数:19
相关论文
共 22 条
[1]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[2]  
CHEN Y, 2007, DDDM 07, P55, DOI DOI 10.1145/1288552.1288560]
[3]   Machine learning techniques for business blog search and mining [J].
Chen, Yun ;
Tsai, Flora S. ;
Chan, Kap Luk .
EXPERT SYSTEMS WITH APPLICATIONS, 2008, 35 (03) :581-590
[4]  
Cox T F, 2000, MULTIDIMENSIONAL SCA
[5]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[6]  
2-9
[7]  
Hand D.J., 2001, ADAP COMP MACH LEARN
[8]   Unsupervised learning by probabilistic latent semantic analysis [J].
Hofmann, T .
MACHINE LEARNING, 2001, 42 (1-2) :177-196
[9]  
KRUSKAL J., 1978, MULTIDIMENSIONAL SCA
[10]  
Liang H., 2009, ICICS 2009 C P 7 INT