Identifying and Lnderstanding Business Trends using Topic Models with Word Embedding

被引:0
作者
Pek, Yun Ning [1 ]
Lim, Kwan Hui [2 ]
机构
[1] Singapore Univ Technol & Design, Engn Syst & Design Pillar, 8 Somapah Rd, Singapore 487372, Singapore
[2] Singapore Univ Technol & Design, Informat Syst Technol & Design Pillar, 8 Somapah Rd, Singapore 487372, Singapore
来源
2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2019年
关键词
Topic Models; Word Embedding; Trend Analysis; Academic Papers;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Topic modelling and trend analysis are increasingly important in today's digital world, especially for identifying promising business ideas and trends. With the increasing amount. of data being generated daily, a key challenge is to effectively identify emerging business ideas/topics and trends from this large volume of data. Towards this effort, we introduce a framework that allows us to identify promising business ideas from a large stream of academic papers. Academic papers are suitable for this purpose as they study emerging areas and problems in different domains. Our framework comprises three main components, namely: (i) a data collection component that retrieves academic papers and their meta-data; (ii) a topic modelling algorithm that combines traditional topic modelling techniques with recent advances in word embeddings; and (iii) a trend analysis component that allows us to visualize the popularity of different business trends/topics across time. Results on a corpus of 287k academic papers show that our proposed methods outperform the standard baselines based on topic coherence scores and also allows us to understand key temporal trends.
引用
收藏
页码:6177 / 6179
页数:3
相关论文
共 9 条
[1]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[2]  
Bojanowski Piotr, 2017, Trans. Assoc. Comput. Linguist., V5, P135, DOI DOI 10.1162/TACL_A_00051
[3]  
Boon S., 2017, 21st Century Science Overload
[4]  
Gallagher R. J., 2017, Transactions of the Association for Computational Linguistics, V5, P529, DOI [https://doi.org/10.1162/tacla00078, DOI 10.1162/TACLA00078, 10.1162/tacl_a_00078]
[5]  
Le Q., 2014, ICML, P1188
[6]  
Lim KH, 2017, IEEE INT CONF BIG DA, P2009, DOI 10.1109/BigData.2017.8258147
[7]  
Mehrotra R, 2013, SIGIR'13: THE PROCEEDINGS OF THE 36TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH & DEVELOPMENT IN INFORMATION RETRIEVAL, P889
[8]  
Mikolov T., 2013, Advances in Neural Information Processing Systems, V26, P1
[9]   Incorporating Probabilistic Knowledge into Topic Models [J].
Yao, Liang ;
Zhang, Yin ;
Wei, Baogang ;
Qian, Hongze ;
Wang, Yibing .
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PART II, 2015, 9078 :586-597