ILLINOISCLOUDNLP: Text Analytics Services in the Cloud

被引:0
作者
Wu, Hao [1 ]
Fei, Zhiye [1 ]
Dai, Aaron [1 ]
Mayhew, Stephen [1 ]
Sammons, Mark [1 ]
Roth, Dan [1 ]
机构
[1] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
来源
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2014年
关键词
Natural Language Processing Tools; Text Analytics; Cloud Computing;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Natural Language Processing (NLP) continues to grow in popularity in a range of research and commercial applications. However, installing, maintaining, and running NLP tools can be time consuming, and many commercial and research end users have only intermittent need for large processing capacity. This paper describes ILLINOISCLOUDNLP, an on-demand framework built around NLPCURATOR and Amazon Web Services' Elastic Compute Cloud (EC2). This framework provides a simple interface to end users via which they can deploy one or more NLPCURATOR instances on EC2, upload plain text documents, specify a set of Text Analytics tools (NLP annotations) to apply, and process and store or download the processed data. It also allows end users to use a model trained on their own data: ILLINOISCLOUDNLP takes care of training, hosting, and applying it to new data just as it does with existing models within NLPCURATOR. As a representative use case, we describe our use of ILLINOISCLOUDNLP to process 3 : 0 5 million documents used in the 2012 and 2013 Text Analysis Conference Knowledge Base Population tasks at a relatively deep level of processing, in approximately 20 hours, at an approximate cost of US$500; this is about 20 times faster than doing so on a single server and requires no human supervision and no NLP or Machine Learning expertise.
引用
收藏
页数:8
相关论文
共 20 条
[1]  
Amazon, 2014, AM WEB SERV
[2]  
[Anonymous], 1998, Proceedings of the joint 17th International Conference on Computational Linguistics and 36th Annual Meeting of the Association for Computational Linguistics, DOI [10.3115/980691.980755, DOI 10.3115/980691.980755]
[3]  
AYLIEN, 2014, AYLIEN INT
[4]  
Cheng Xiao, 2013, P C EMP METH NAT LAN
[5]  
Clarke J., 2012, P INT C LANG RES EV, P5
[6]  
Cunningham H., 2002, ACL
[7]  
Ellis J., 2012, TEXT AN C TAC
[8]  
Ellis J., 2013, TEXT AN C TAC
[9]  
GATECLOUD, 2014, GATECLOUD TEXT SOL C
[10]  
GoogleAPI, 2014, GOOGL PRED API