Improving Text Analysis Using Sentence Conjunctions and Punctuation

被引:20
作者
Bueschken, Joachim [1 ]
Allenby, Greg M. [2 ]
机构
[1] Catholic Univ Eichstatt Ingolstadt, Sch Management, D-85049 Ingolstadt, Germany
[2] Ohio State Univ, Fisher Coll Business, Columbus, OH 43210 USA
关键词
user-generated content; latent Dirichlet allocation (LDA); topic dependency; syntactic covariates; Bayesian analysis; customer satisfaction analysis; CHANGEPOINT MODEL; CUSTOMER;
D O I
10.1287/mksc.2019.1214
中图分类号
F [经济];
学科分类号
02 ;
摘要
User-generated content in the form of customer reviews, blogs, and tweets is an emerging and rich source of data for marketers. Topic models have been successfully applied to such data, demonstrating that empirical text analysis benefits greatly from a latent variable approach that summarizes high-level interactions among words. We propose a new topic model that allows for serial dependency of topics in text. That is, topics may carry over from word to word in a document, violating the bag-of-words assumption in traditional topic models. In the proposed model, topic carryover is informed by sentence conjunctions and punctuation. Typically, such observed information is eliminated prior to analyzing text data (i.e., preprocessing) because words such as "and" and "but" do not differentiate topics. We find that these elements of grammar contain information relevant to topic changes. We examine the performance of our models using multiple data sets and establish boundary conditions for when our model leads to improved inference about customer evaluations. Implications and opportunities for future research are discussed.
引用
收藏
页码:727 / 742
页数:16
相关论文
共 25 条
[1]  
Anjie Fang, 2016, Advances in Information Retrieval. 38th European Conference on IR Research, ECIR 2016. Proceedings
[2]  
LNCS 9626, P492, DOI 10.1007/978-3-319-30671-1_36
[3]  
[Anonymous], 2007, Advances in Neural Information Processing Systems (NIPS)
[4]  
[Anonymous], 2004, ADV NEURAL INFORM PR
[5]   A CORRELATED TOPIC MODEL OF SCIENCE [J].
Blei, David M. ;
Lafferty, John D. .
ANNALS OF APPLIED STATISTICS, 2007, 1 (01) :17-35
[6]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[7]   Sentence-Based Text Analysis for Customer Reviews [J].
Bueschken, Joachim ;
Allenby, Greg M. .
MARKETING SCIENCE, 2016, 35 (06) :953-975
[8]  
Chang J, 2009, Adv Neural Inf Process Syst, V22
[9]   Estimation and comparison of multiple change-point models [J].
Chib, S .
JOURNAL OF ECONOMETRICS, 1998, 86 (02) :221-241
[10]   Modeling dynamic effects in repeated-measures experiments involving preference/choice: An illustration involving stated preference analysis [J].
DeSarbo, WS ;
Lehmann, DR ;
Hollman, FG .
APPLIED PSYCHOLOGICAL MEASUREMENT, 2004, 28 (03) :186-209