MR-LDA: An Efficient Topic Model for Classification of Short Text in Big Social Data

被引:9
作者
Pang, Xiongwen [1 ]
Wan, Benshuai [2 ]
Li, Huifang [1 ]
Lin, Weiwei [3 ]
机构
[1] South China Normal Univ, Sch Comp, Guangzhou, Guangdong, Peoples R China
[2] Guangdong Nanhai Rural Commercial Bank Co Ltd, Dept Informat Technol, Guangzhou, Guangdong, Peoples R China
[3] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Big Data; Latent Dirichlet Allocation; Micro-Blog; Social Network; Topic Mining;
D O I
10.4018/IJGHPC.2016100106
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Latent Dirichlet Allocation(LDA) is an efficient method of text mining, but applying LDA directly to Chinese micro-blog texts will not work well because micro-blogs are more social, brief, and closely related with each other. Based on LDA, this paper proposes a Micro-blog Relation LDA model (MR-LDA), which takes the relations between Chinese micro-blog documents and other Chinese microblog documents into consideration to help topic mining in micro-blog. The authors extend LDA in the following two points. First, they aggregate several Chinese micro-blogs as a single micro-blog document to solve the problem of short texts. Second, they model the generation process of Chinese micro-blogs more accurately by taking relationship between micro-blog documents into consideration. MR-LDA is more suitable to model Chinese micro-blog data. Gibbs sampling method is borrowed to inference the model. Experimental results on actual datasets show that MR-LDA model can offer an effective solution to text mining for Chinese micro-blog.
引用
收藏
页码:100 / 113
页数:14
相关论文
共 50 条
[21]   Corporate social responsibility reports: topic analysis and big data approach [J].
Goloshchapova, Irina ;
Poon, Ser-Huang ;
Pritchard, Matthew ;
Reed, Phil .
EUROPEAN JOURNAL OF FINANCE, 2019, 25 (17) :1637-1654
[22]   Discovering Contrast Sets for Efficient Classification of Big Data [J].
Al Aghbari, Zaher ;
Junejo, Imran N. .
PROCEEDINGS 2016 2ND INTERNATIONAL CONFERENCE ON OPEN AND BIG DATA - OBD 2016, 2016, :45-51
[23]   Social media platform-oriented topic mining and information security analysis by big data and deep convolutional neural network [J].
Wang, Changlin .
TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE, 2024, 199
[24]   Design and Analysis of a Weight-LDA Model to Extract Implicit Topic of Database in Social Networks [J].
Huang, Li ;
Xu, Shenghua ;
Hu, Guoxiong ;
Zhang, Cong ;
Xiong, Neal N. .
JOURNAL OF INTERNET TECHNOLOGY, 2017, 18 (06) :1393-1406
[25]   Spatial Temporal Topic Embedding: A Semantic Modeling Method for Short Text in Social Network [J].
Yang, Congxian ;
Du, Junping ;
Kou, Feifei ;
Lee, Jangmyung .
ARTIFICIAL INTELLIGENCE (ICAI 2018), 2018, 888 :198-210
[26]   Research trends on Big Data in Marketing: A text mining and topic modeling based literature analysis [J].
Amado, Alexandra ;
Cortez, Paulo ;
Rita, Paulo ;
Moro, Sergio .
EUROPEAN RESEARCH ON MANAGEMENT AND BUSINESS ECONOMICS, 2018, 24 (01) :1-7
[27]   NoCS2: Topic-Based Clustering of Big Data Text Corpus in the Cloud [J].
Zobaed, S. M. ;
Haque, Md Enamul ;
Kaiser, Shahidullah ;
Hussain, Razin Farhan .
2018 21ST INTERNATIONAL CONFERENCE OF COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2018,
[28]   Big Data Software Engineering: Analysis of Knowledge Domains and Skill Sets Using LDA-Based Topic Modeling [J].
Gurcan, Fatih ;
Cagiltay, Nergiz Ercil .
IEEE ACCESS, 2019, 7 :82541-82552
[29]   The Survey on Approaches to Efficient Clustering and Classification Analysis of Big Data [J].
Gandhi, Bhagyashri S. ;
Deshpande, Leena A. .
2016 INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2016,
[30]   A Text Mining Analysis on Big Data Extracted from Social Media [J].
Schoier, Gabriella ;
Borruso, Giuseppe ;
Tossut, Pietro .
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS, ICCSA 2020, PART IV, 2020, 12252 :351-364