Empath: Understanding Topic Signals in Large-Scale Text

被引：206

作者：

Fast, Ethan ^{[1
]}

Chen, Binbin ^{[1
]}

Bernstein, Michael S. ^{[1
]}

机构：

[1] Stanford Univ, Stanford, CA 94305 USA

来源：

34TH ANNUAL CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2016 | 2016年

关键词：

social computing; computational social science; fiction;

D O I：

10.1145/2858036.2858535

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human language is colored by a broad range of topics, but existing text analysis tools only focus on a small number of them. We present Empath, a tool that can generate and validate new lexical categories on demand from a small set of seed terms (like "bleed" and "punch" to generate the category violence). Empath draws connotations between words and phrases by deep learning a neural embedding across more than 1.8 billion words of modern fiction. Given a small set of seed words that characterize a category, Empath uses its neural embedding to discover new related terms, then validates the category with a crowd-powered filter. Empath also analyzes text across 200 built-in, pre-validated categories we have generated from common topics in our web dataset, like neglect, government, and social media. We show that Empath's data-driven, human validated categories are highly correlated (r=0.906) with similar categories in LIWC.

引用

页码：4647 / 4657

页数：11

共 43 条

[1]

[Anonymous], 2014, Information Processing Management

[2]

[Anonymous], 2001, LINGUISTIC INQUIRY W

[3]

[Anonymous], 1966, The general inquirer: A computer approach to content analysis

[4]

Bollen J, 2011, Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena, V5, P450

[5]

Bradley M.M., 1999, PSYCHOLOGY

[6]

Chambers Nathanael, P ACL 2009

[7]

Danescu-Niculescu-Mizil C., P ACL 2013

[8]

Davis H., 2014, ARXIV14032124

[9]

De Choudhury Munmun, P HCI KOR 2014

[10] Approximate statistical tests for comparing supervised classification learning algorithms [J].

Dietterich, TG .

NEURAL COMPUTATION, 1998, 10 (07) :1895-1923

← 1 2 3 4 5 →