Constructing Structured Information Networks from Massive Text Corpora

被引:2
作者
Ren, Xiang [1 ]
Jiang, Meng [1 ]
Shang, Jingbo [1 ]
Han, Jiawei [1 ]
机构
[1] Univ Illinois, Dept Comp Sci, Champaign, IL 61820 USA
来源
WWW'17 COMPANION: PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB | 2017年
基金
美国国家科学基金会;
关键词
Quality Phrase Mining; Entity Recognition and Typing; Attribute Discovery; Massive Text Corpora; Relation Extraction;
D O I
10.1145/3041021.3051107
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In today's computerized and information-based society, text data is rich but messy. People are soaked with vast amounts of natural-language text data, ranging from news articles, social media post, advertisements, to a wide range of textual information from various domains (medical records, corporate reports). To turn such massive unstructured text data into actionable knowledge, one of the grand challenges is to gain an understanding of the factual information (e.g., entities, attributes, relations, events) in the text. In this tutorial, we introduce data-driven methods to construct structured information networks (where nodes are different types of entities attached with attributes, and edges are different relations between entities) for text corpora of different kinds (especially for massive, domain-specific text corpora) to represent their factual information. We focus on methods that are minimally-supervised, domain-independent, and language-independent for fast network construction across various application domains (news, web, biomedical, reviews). We demonstrate on real datasets including news articles, scientific publications, tweets and reviews how these constructed networks aid in text analytics and knowledge discovery at a large scale.
引用
收藏
页码:951 / 954
页数:4
相关论文
共 45 条
[1]  
Agichtein E., 2000, ACM 2000. Digital Libraries. Proceedings of the Fifth ACM Conference on Digital Libraries, P85, DOI 10.1145/336597.336644
[2]  
[Anonymous], 2011, P 2011 C EMPIRICAL M
[3]  
[Anonymous], 2009, P 13 C COMP NAT LANG, DOI 10.3115/1596374.1596399
[4]  
[Anonymous], 2005, INT C NEUR INF PROC
[5]  
[Anonymous], 2012, Proceedings of the 26th Conference on Artificial Intelligence, DOI 10.1609/aaai.v26i1.8122
[6]  
[Anonymous], 2004, WWW '04, DOI DOI 10.1145/988672.988687
[7]  
[Anonymous], 2005, P 43 ANN M ASS COMP
[8]  
[Anonymous], 2009, ACL 2009
[9]  
[Anonymous], 2011, ACL
[10]  
Bach Nguyen., LIT REV LANGUAGE STA