A Cloud-hosted MapReduce Architecture for Syntactic Parsing

被引:0
|
作者
Woldemariam, Yonas D. [1 ]
Pletschacher, Stefan [2 ]
Clausner, Christian [2 ]
Bass, Julian M. [2 ]
机构
[1] Umea Univ, Comp Sci, Umea, Sweden
[2] Univ Salford, Comp Sci & Software Engn, Manchester, Lancs, England
来源
2019 45TH EUROMICRO CONFERENCE ON SOFTWARE ENGINEERING AND ADVANCED APPLICATIONS (SEAA 2019) | 2019年
关键词
cloud deployment; natural language processing (NLP); syntactic parsing; VIEW;
D O I
10.1109/SEAA.2019.00024
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Syntactic parsing is a time-consuming task in natural language processing particularly where a large number of text files are being processed. Parsing algorithms are conventionally designed to operate on a single machine in a sequential fashion and, as a consequence, fail to benefit from high performance and parallel computing resources available on the cloud. We designed and implemented a scalable cloud-based architecture supporting parallel and distributed syntactic parsing for large datasets. The main architecture consists of a syntactic parser (constituency and dependency parsing) and a MapReduce framework running on clusters of machines. The resulting cloud-based MapReduce parsing is able to build a map where syntactic trees of the same input file have the same key and collect into a single file containing sentences along with their corresponding trees. Our experimental evaluation shows that the architecture scales well with regard to number or processing nodes and number of cores per node. In the fastest tested cloud-based setup, the proposed design performs 7 times faster when compared to a local setup. In summary, this study takes an important step toward providing and evaluating a cloud-hosted solution for efficient syntactic parsing of natural language data sets consisting of a large number of files.
引用
收藏
页码:100 / 107
页数:8
相关论文
共 27 条
  • [1] Optimal deployment of components of cloud-hosted application for guaranteeing multitenancy isolation
    Ochei, Laud Charles
    Petrovski, Andrei
    Bass, Julian M.
    JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2019, 8 (1):
  • [2] Optimal deployment of components of cloud-hosted application for guaranteeing multitenancy isolation
    Laud Charles Ochei
    Andrei Petrovski
    Julian M. Bass
    Journal of Cloud Computing, 8
  • [3] Syntactic parsing with hierarchical modeling
    Li, Junhui
    Zhou, Guodong
    Zhu, Qiaoming
    Qian, Peide
    INFORMATION RETRIEVAL TECHNOLOGY, 2008, 4993 : 561 - 566
  • [4] An Effective Framework for Chinese Syntactic Parsing
    Li, Xing
    Zong, Chengqing
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 2, 2005, 2 : 201 - 204
  • [5] SUBDIVIDING VERBS TO IMPROVE SYNTACTIC PARSING
    Liu Ting Ma Jinshan Zhang Huipeng Li Sheng (Information Retrieval Lab
    Journal of Electronics(China), 2007, (03) : 347 - 352
  • [6] AUTOMATIC SYNTACTIC PARSING OF RADIOLOGICAL DIAGNOSES
    SAIDA, Y
    KIMURA, M
    ANNO, I
    ITAI, Y
    EUROPEAN RADIOLOGY, 1995, 5 (06) : 647 - 650
  • [7] Beat Gestures and Syntactic Parsing: An ERP Study
    Biau, Emmanuel
    Fromont, Lauren A.
    Soto-Faraco, Salvador
    LANGUAGE LEARNING, 2018, 68 : 102 - 126
  • [8] Syntactic parsing of clause constituents for statistical machine translation
    Ma, Jianjun
    Pei, Jiahuan
    Huang, Degen
    Song, Dingxin
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2018, 17 (01) : 126 - 132
  • [9] Syntactic parsing of clause constituents for statistical machine translation
    Ma J.
    Pei J.
    Huang D.
    Song D.
    Ma, Jianjun (majian@dlut.edu.cn), 2018, Inderscience Publishers, 29, route de Pre-Bois, Case Postale 856, CH-1215 Geneva 15, CH-1215, Switzerland (17) : 126 - 132
  • [10] Specific contribution of tonal and duration cues to the syntactic parsing of French
    Michelas, Amandine
    D'Imperio, Mariapaola
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON SPEECH PROSODY, VOLS I AND II, 2012, : 147 - 150