A Cloud-hosted MapReduce Architecture for Syntactic Parsing

被引:0
作者
Woldemariam, Yonas D. [1 ]
Pletschacher, Stefan [2 ]
Clausner, Christian [2 ]
Bass, Julian M. [2 ]
机构
[1] Umea Univ, Comp Sci, Umea, Sweden
[2] Univ Salford, Comp Sci & Software Engn, Manchester, Lancs, England
来源
2019 45TH EUROMICRO CONFERENCE ON SOFTWARE ENGINEERING AND ADVANCED APPLICATIONS (SEAA 2019) | 2019年
关键词
cloud deployment; natural language processing (NLP); syntactic parsing; VIEW;
D O I
10.1109/SEAA.2019.00024
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Syntactic parsing is a time-consuming task in natural language processing particularly where a large number of text files are being processed. Parsing algorithms are conventionally designed to operate on a single machine in a sequential fashion and, as a consequence, fail to benefit from high performance and parallel computing resources available on the cloud. We designed and implemented a scalable cloud-based architecture supporting parallel and distributed syntactic parsing for large datasets. The main architecture consists of a syntactic parser (constituency and dependency parsing) and a MapReduce framework running on clusters of machines. The resulting cloud-based MapReduce parsing is able to build a map where syntactic trees of the same input file have the same key and collect into a single file containing sentences along with their corresponding trees. Our experimental evaluation shows that the architecture scales well with regard to number or processing nodes and number of cores per node. In the fastest tested cloud-based setup, the proposed design performs 7 times faster when compared to a local setup. In summary, this study takes an important step toward providing and evaluating a cloud-hosted solution for efficient syntactic parsing of natural language data sets consisting of a large number of files.
引用
收藏
页码:100 / 107
页数:8
相关论文
共 27 条
  • [21] How important is syntactic parsing accuracy? An empirical evaluation on rule-based sentiment analysis
    Gomez-Rodriguez, Carlos
    Alonso-Alonso, Iago
    Vilares, David
    ARTIFICIAL INTELLIGENCE REVIEW, 2019, 52 (03) : 2081 - 2097
  • [22] How and when prosodic boundaries influence syntactic parsing under different discourse contexts: An ERP study
    Li, Xiao-qing
    Yang, Yu-fang
    Lu, Yong
    BIOLOGICAL PSYCHOLOGY, 2010, 83 (03) : 250 - 259
  • [23] Morpho-syntactic parsing for a text mining environment: An NP recognition model for knowledge visualization and information retrieval
    Sidhom, S
    Hassoun, M
    KNOWLEDGE ORGANIZATION, 2002, 29 (03): : 171 - 180
  • [24] SuperArch: Optimal Architecture Design for Cloud Deployment
    Singh, Kuldeep
    Phalak, Chetan
    Chahal, Dheeraj
    Kunde, Shruti
    Singhal, Rekha
    COMPANION OF THE 15TH ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING, ICPE COMPANION 2024, 2024, : 91 - 92
  • [25] Prototyping High Efficiency Cloud Computing Architecture: Implementation of a Content Delivery Network Server on FPGA
    Cheng, Gang
    Zhu, Yongxin
    Rong, Guoguang
    Qiu, Meikang
    2012 7TH INTERNATIONAL CONFERENCE ON COMPUTING AND CONVERGENCE TECHNOLOGY (ICCCT2012), 2012, : 1120 - 1124
  • [26] C2@home, a novel user-side cloud-of-clouds management architecture
    Di Stefano, Antonella
    Morana, Giovanni
    Zito, Daniele
    CIT/IUCC/DASC/PICOM 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY - UBIQUITOUS COMPUTING AND COMMUNICATIONS - DEPENDABLE, AUTONOMIC AND SECURE COMPUTING - PERVASIVE INTELLIGENCE AND COMPUTING, 2015, : 1474 - 1481
  • [27] Key Considerations in Optimizing the Deployment of Big Data Analytics-as-a-Service Utilizing Cloud Architecture and Machine Learning
    Unhelkar, Bhuvan
    Rao, V. Trivikram
    PROCEEDINGS OF ICETIT 2019: EMERGING TRENDS IN INFORMATION TECHNOLOGY, 2020, 605 : 818 - 832