sDTM: A Supervised Bayesian Deep Topic Model for Text Analytics

被引:20
作者
Yang, Yi [1 ]
Zhang, Kunpeng [2 ]
Fan, Yangyang [3 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Informat Syst Business Stat & Operat Managem, Hong Kong, Peoples R China
[2] Univ Maryland, Robert H Smith Sch Business, Dept Decis Operat & Informat Technol, College Pk, MD 20742 USA
[3] Hong Kong Polytech Univ, Fac Business, Sch Accounting & Finance, Hong Kong, Peoples R China
关键词
supervised topic modeling; Bayesian variational inference; deep learning; text analysis; BIG DATA; ONLINE REVIEWS; IMPACT; CLASSIFICATION; INFORMATION; RESPONSES;
D O I
10.1287/isre.2022.1124
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
Topic modeling methods such as latent Dirichlet allocation (LDA) are powerful tools for analyzing massive amounts of textual data. They have been used extensively in information systems (IS) and business discipline research to identify latent topics for data exploration and as a feature engineering mechanism to derive new variables for analyses. However, existing topic modeling approaches are mostly unsupervised and only leverage textual data, while ignoring additional useful metadata often associated with text, such as star ratings in customer reviews or categories of posts in online forums. As a result, the identified topics and variables derived based on the learned topic model may not be accurate, which could lead to incorrect estimations that affect subsequent empirical analysis and to inferior performance on predictive tasks. In this study, we propose a novel supervised deep topic modeling approach called sDTM, which combines a neural variational autoencoder model and a recurrent neural network. sDTM leverages the auxiliary data associated with text to enhance the topic modeling capability. We conduct empirical case studies and predictive analytics on an online consumer review data set and an online knowledge community data set. Experimental results show that in comparison with benchmark methods, sDTM can enhance both the empirical estimation and predictive performance. sDTM makes methodological contributions to the IS literature and has direct relevance for research using text analytics.
引用
收藏
页码:137 / 156
页数:20
相关论文
共 69 条
  • [1] Selecting Attributes for Sentiment Classification Using Feature Relation Networks
    Abbasi, Ahmed
    France, Stephen
    Zhang, Zhu
    Chen, Hsinchun
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (03) : 447 - 462
  • [2] An Integrated Text Analytic Framework for Product Defect Discovery
    Abrahams, Alan S.
    Fan, Weiguo
    Wang, G. Alan
    Zhang, Zhongju
    Jiao, Jian
    [J]. PRODUCTION AND OPERATIONS MANAGEMENT, 2015, 24 (06) : 975 - 990
  • [3] Adhikari A, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4046
  • [4] Big Data, Data Science, and Analytics: The Opportunity and Challenge for IS Research
    Agarwal, Ritu
    Dhar, Vasant
    [J]. INFORMATION SYSTEMS RESEARCH, 2014, 25 (03) : 443 - 448
  • [5] A Deep Learning Architecture for Psychometric Natural Language Processing
    Ahmad, Faizan
    Abbasi, Ahmed
    Li, Jingjing
    Dobolyi, David G.
    Netemeyer, Richard G.
    Clifford, Gari D.
    Chen, Hsinchun
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2020, 38 (01)
  • [6] Baccianella S, 2010, LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION
  • [7] Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
  • [8] Simultaneously Discovering and Quantifying Risk Types from Textual Risk Disclosures
    Bao, Yang
    Datta, Anindya
    [J]. MANAGEMENT SCIENCE, 2014, 60 (06) : 1371 - 1391
  • [9] A Text-Based Analysis of Corporate Innovation
    Bellstam, Gustaf
    Bhagat, Sanjai
    Cookson, J. Anthony
    [J]. MANAGEMENT SCIENCE, 2021, 67 (07) : 4004 - 4031
  • [10] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022