sDTM: A Supervised Bayesian Deep Topic Model for Text Analytics

被引：20

作者：

Yang, Yi ^{[1
]}

Zhang, Kunpeng ^{[2
]}

Fan, Yangyang ^{[3
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Informat Syst Business Stat & Operat Managem, Hong Kong, Peoples R China

[2] Univ Maryland, Robert H Smith Sch Business, Dept Decis Operat & Informat Technol, College Pk, MD 20742 USA

[3] Hong Kong Polytech Univ, Fac Business, Sch Accounting & Finance, Hong Kong, Peoples R China

来源：

INFORMATION SYSTEMS RESEARCH | 2023年 / 34卷 / 01期

关键词：

supervised topic modeling; Bayesian variational inference; deep learning; text analysis; BIG DATA; ONLINE REVIEWS; IMPACT; CLASSIFICATION; INFORMATION; RESPONSES;

D O I：

10.1287/isre.2022.1124

中图分类号：

G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];

学科分类号：

1205 ; 120501 ;

摘要：

Topic modeling methods such as latent Dirichlet allocation (LDA) are powerful tools for analyzing massive amounts of textual data. They have been used extensively in information systems (IS) and business discipline research to identify latent topics for data exploration and as a feature engineering mechanism to derive new variables for analyses. However, existing topic modeling approaches are mostly unsupervised and only leverage textual data, while ignoring additional useful metadata often associated with text, such as star ratings in customer reviews or categories of posts in online forums. As a result, the identified topics and variables derived based on the learned topic model may not be accurate, which could lead to incorrect estimations that affect subsequent empirical analysis and to inferior performance on predictive tasks. In this study, we propose a novel supervised deep topic modeling approach called sDTM, which combines a neural variational autoencoder model and a recurrent neural network. sDTM leverages the auxiliary data associated with text to enhance the topic modeling capability. We conduct empirical case studies and predictive analytics on an online consumer review data set and an online knowledge community data set. Experimental results show that in comparison with benchmark methods, sDTM can enhance both the empirical estimation and predictive performance. sDTM makes methodological contributions to the IS literature and has direct relevance for research using text analytics.

引用

页码：137 / 156

页数：20

共 69 条

[1] Selecting Attributes for Sentiment Classification Using Feature Relation Networks
Abbasi, Ahmed
France, Stephen
Zhang, Zhu
Chen, Hsinchun
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (03) : 447 - 462
[2] An Integrated Text Analytic Framework for Product Defect Discovery
Abrahams, Alan S.
Fan, Weiguo
Wang, G. Alan
Zhang, Zhongju
Jiao, Jian
[J]. PRODUCTION AND OPERATIONS MANAGEMENT, 2015, 24 (06) : 975 - 990
[3] Adhikari A, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4046
[4] Big Data, Data Science, and Analytics: The Opportunity and Challenge for IS Research
Agarwal, Ritu
Dhar, Vasant
[J]. INFORMATION SYSTEMS RESEARCH, 2014, 25 (03) : 443 - 448
[5] A Deep Learning Architecture for Psychometric Natural Language Processing
Ahmad, Faizan
Abbasi, Ahmed
Li, Jingjing
Dobolyi, David G.
Netemeyer, Richard G.
Clifford, Gari D.
Chen, Hsinchun
[J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2020, 38 (01)
[6] Baccianella S, 2010, LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION
[7] Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[8] Simultaneously Discovering and Quantifying Risk Types from Textual Risk Disclosures
Bao, Yang
Datta, Anindya
[J]. MANAGEMENT SCIENCE, 2014, 60 (06) : 1371 - 1391
[9] A Text-Based Analysis of Corporate Innovation
Bellstam, Gustaf
Bhagat, Sanjai
Cookson, J. Anthony
[J]. MANAGEMENT SCIENCE, 2021, 67 (07) : 4004 - 4031
[10] Latent Dirichlet allocation
Blei, DM
Ng, AY
Jordan, MI
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022

← 1 2 3 4 5 6 7 →