Empath: Understanding Topic Signals in Large-Scale Text

被引:206
作者
Fast, Ethan [1 ]
Chen, Binbin [1 ]
Bernstein, Michael S. [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
来源
34TH ANNUAL CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2016 | 2016年
关键词
social computing; computational social science; fiction;
D O I
10.1145/2858036.2858535
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human language is colored by a broad range of topics, but existing text analysis tools only focus on a small number of them. We present Empath, a tool that can generate and validate new lexical categories on demand from a small set of seed terms (like "bleed" and "punch" to generate the category violence). Empath draws connotations between words and phrases by deep learning a neural embedding across more than 1.8 billion words of modern fiction. Given a small set of seed words that characterize a category, Empath uses its neural embedding to discover new related terms, then validates the category with a crowd-powered filter. Empath also analyzes text across 200 built-in, pre-validated categories we have generated from common topics in our web dataset, like neglect, government, and social media. We show that Empath's data-driven, human validated categories are highly correlated (r=0.906) with similar categories in LIWC.
引用
收藏
页码:4647 / 4657
页数:11
相关论文
共 43 条
[21]   Experimental evidence of massive-scale emotional contagion through social networks [J].
Kramer, Adam D. I. ;
Guillory, Jamie E. ;
Hancock, Jeffrey T. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2014, 111 (24) :8788-8790
[22]  
Kumar R., P CHI 2013
[23]  
Liu H., 2004, BT TECHNOLOGY J 2004
[24]  
Luo Qun, 2015, P AAAI 2015
[25]  
Mikolov T., P NIPS 2013
[26]  
Mikolov Tomas, P NAACL HLT 2013
[27]  
Miller George A., COMMUN ACM 1995
[28]  
Mitra Tanushree, P CHI 15
[29]   CROWDSOURCING A WORD-EMOTION ASSOCIATION LEXICON [J].
Mohammad, Saif M. ;
Turney, Peter D. .
COMPUTATIONAL INTELLIGENCE, 2013, 29 (03) :436-465
[30]  
Neviarouskaya A., 2007, ICWSM