Profile generation from web sources: an information extraction system

被引:0
作者
Rishabh Ranjan
H. Vathsala
Shashidhar G. Koolagudi
机构
[1] Birla Institute of Technology,
[2] Centre for Development of Advanced Computing,undefined
[3] National Insititute of Technology Karnataka,undefined
来源
Social Network Analysis and Mining | 2022年 / 12卷
关键词
Information extraction; ProfileGen; Data mining; Natural language processing; Recurrent neural network; Biography generation;
D O I
暂无
中图分类号
学科分类号
摘要
The Internet space has a vast collection of information which is not always structured. These sources of information such as social media, news articles, blogs, speeches and videos often contain information that could be utilized to generate decision making tools such as reports about events and individuals. Using this information is a long and tedious process if done manually. Over the years, a lot of research has been done in data mining and natural language processing techniques to facilitate the consumption of this vast amount of data. The current work describes ProfileGen, an information extraction system that uses a variety of these data sources to form a profile of a given person. There are two parts to this application: The first part uses information publicly available on social media sites, news articles on news websites and blogs and compiles this information to form a corpus about the given person, and in the second part, the information is ranked using machine learning techniques, so as to provide information in the order of importance.
引用
收藏
相关论文
共 30 条
[1]  
Adnan K(2019)An analytical study of information extraction from unstructured and multidimensional big data J Big Data 6 1-144
[2]  
Akbar R(2015)Beyond the hype: big data concepts, methods, and analytics Int J Inf Manag 35 137-376
[3]  
Amir G(2014)Unsupervised discovery of biographical structure from text Trans Assoc Comput Linguist 2 363-479
[4]  
Murtaza H(2004)LexRank: graph-based lexical centrality as salience in text summarization J Artif Intell Res 22 457-23
[5]  
David B(2020)spaCy: Industrial-strength Natural Language Processing in Python Zenodo 14 15-78
[6]  
Smith Noah A(2010)Homophily in the digital world: a livejournal case study Internet Comput 39 885-44
[7]  
Erkan G(2013)Deterministic coreference resolution based on entity-centric, precision-ranked rules Comput Linguist 22 55-undefined
[8]  
Radev DR(2019)information extraction system for transforming unstructured text data in fire reports into structured forms: a polish case study Fire Technol 14 34-undefined
[9]  
Honnibal Matthew(2019)A framework for information extraction from tables in biomedical literature Int J Doc Anal Recognit undefined undefined-undefined
[10]  
Montani Ines(2010)Metrics for monitoring a socialpolitical blogosphere: a malaysian case study Internet Comput undefined undefined-undefined