MR-LDA: An Efficient Topic Model for Classification of Short Text in Big Social Data

被引:10
作者
Pang, Xiongwen [1 ]
Wan, Benshuai [2 ]
Li, Huifang [1 ]
Lin, Weiwei [3 ]
机构
[1] South China Normal Univ, Sch Comp, Guangzhou, Guangdong, Peoples R China
[2] Guangdong Nanhai Rural Commercial Bank Co Ltd, Dept Informat Technol, Guangzhou, Guangdong, Peoples R China
[3] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Big Data; Latent Dirichlet Allocation; Micro-Blog; Social Network; Topic Mining;
D O I
10.4018/IJGHPC.2016100106
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Latent Dirichlet Allocation(LDA) is an efficient method of text mining, but applying LDA directly to Chinese micro-blog texts will not work well because micro-blogs are more social, brief, and closely related with each other. Based on LDA, this paper proposes a Micro-blog Relation LDA model (MR-LDA), which takes the relations between Chinese micro-blog documents and other Chinese microblog documents into consideration to help topic mining in micro-blog. The authors extend LDA in the following two points. First, they aggregate several Chinese micro-blogs as a single micro-blog document to solve the problem of short texts. Second, they model the generation process of Chinese micro-blogs more accurately by taking relationship between micro-blog documents into consideration. MR-LDA is more suitable to model Chinese micro-blog data. Gibbs sampling method is borrowed to inference the model. Experimental results on actual datasets show that MR-LDA model can offer an effective solution to text mining for Chinese micro-blog.
引用
收藏
页码:100 / 113
页数:14
相关论文
共 50 条
[41]   Efficient Classification and Rapid Processing of Big Data in Power Distribution Networks [J].
Ning, Luan ;
Li, Cheng ;
Wang, Dingji ;
Wang, Shuaimei .
IEEE ACCESS, 2024, 12 :176418-176424
[42]   An efficient ACO-PSO-based framework for data classification and preprocessing in big data [J].
Ashutosh Kumar Dubey ;
Abhishek Kumar ;
Rashmi Agrawal .
Evolutionary Intelligence, 2021, 14 :909-922
[43]   An efficient ACO-PSO-based framework for data classification and preprocessing in big data [J].
Dubey, Ashutosh Kumar ;
Kumar, Abhishek ;
Agrawal, Rashmi .
EVOLUTIONARY INTELLIGENCE, 2021, 14 (02) :909-922
[44]   A Distributed Arabic Text Classification Approach Using Latent Semantic Analysis for Big data [J].
Alazzam, Hadeel ;
Alsmady, Abdulsalam .
PROCEEDINGS OF THE 2017 12TH INTERNATIONAL SCIENTIFIC AND TECHNICAL CONFERENCE ON COMPUTER SCIENCES AND INFORMATION TECHNOLOGIES (CSIT 2017), VOL. 1, 2017, :58-61
[45]   Information Diffusion Model Based on Social Big Data [J].
Dawei Jin ;
Xiao Ma ;
Yin Zhang ;
Haider Abbas ;
Han Yu .
Mobile Networks and Applications, 2018, 23 :717-722
[46]   Information Diffusion Model Based on Social Big Data [J].
Jin, Dawei ;
Ma, Xiao ;
Zhang, Yin ;
Abbas, Haider ;
Yu, Han .
MOBILE NETWORKS & APPLICATIONS, 2018, 23 (04) :717-722
[47]   Bibliometric study on big data research: An integration of topic model and citation network analysis [J].
Ke, Dong ;
Jiang, Wu ;
Ni, Cheng .
16TH INTERNATIONAL CONFERENCE ON SCIENTOMETRICS & INFORMETRICS (ISSI 2017), 2017, :1584-1597
[48]   DATM: A Novel Data Agnostic Topic Modeling Technique With Improved Effectiveness for Both Short and Long Text [J].
Bewong, Michael ;
Wondoh, John ;
Kwashie, Selasi ;
Liu, Jixue ;
Liu, Lin ;
Li, Jiuyong ;
Islam, Md. Zahidul ;
Kernot, David .
IEEE ACCESS, 2023, 11 :32826-32841
[49]   Does big data mean big knowledge? Integration of big data analysis and conceptual model for social commerce research [J].
Tian, Xuemei ;
Liu, Libo .
ELECTRONIC COMMERCE RESEARCH, 2017, 17 (01) :169-183
[50]   Does big data mean big knowledge? Integration of big data analysis and conceptual model for social commerce research [J].
Xuemei Tian ;
Libo Liu .
Electronic Commerce Research, 2017, 17 :169-183