A Parallel Platform for Web Text Mining

被引:1
作者
Ping Lu [1 ]
Zhenjiang Dong [1 ]
Shengmei Luo [1 ]
Lixia Liu [1 ]
Shanshan Guan [2 ]
Shengyu Liu [2 ]
Qingcai Chen [2 ]
机构
[1] ZTE Corporation
[2] Shenzhen Graduate School,Harbin Institute of Technology
关键词
natural language processing; text mining; massive data; parallel; web knowledge service;
D O I
暂无
中图分类号
TP311.13 [];
学科分类号
1201 ;
摘要
With user-generated content, anyone can be a content creator.This phenomenon has infinitely increased the amount of information circulated online, and it is becoming harder to efficient y obtain required information. In this paper, we describe how natural language processing and text mining can be parallelized using Hadoop and Message Passing Interface. We propose a parallel web text mining platform that processes massive amounts of data quickly and efficiently. Our web knowledge service platform is designed to collect information about the IT and telecommunications industries from the web and process this information using natural language processing and data-mining techniques.
引用
收藏
页码:56 / 61
页数:6
相关论文
共 2 条
[1]  
A high-performance, portable implementation of the MPI message passing interface standard[J] . William Gropp,Ewing Lusk,Nathan Doss,Anthony Skjellum. Parallel Computing . 1996 (6)
[2]  
Pattern Recognition Concepts. Methods and Applications .2 J. P. Marques,Y.F. Wu. Tsinghua University Press . 2002