Part-of-speech tagging of building codes empowered by deep learning and transformational rules

被引:26
|
作者
Xue, Xiaorui [1 ]
Zhang, Jiansong [1 ]
机构
[1] Purdue Univ, Sch Construct Management Technol, Automat & Intelligent Construct Lab, 401 N Grant St, W Lafayette, IN 47907 USA
基金
美国国家科学基金会;
关键词
Automated compliance checking; Automated information extraction; Natural language processing; Part-of-speech tagging; Automated construction management systems; Deep learning; NEURAL-NETWORK; ATTENTION;
D O I
10.1016/j.aei.2020.101235
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automated building code compliance checking systems were under development for many years. However, the excessive amount of human inputs needed to convert building codes from natural language to computer understandable formats severely limited their range of applicable code requirements. To address that, automated code compliance checking systems need to enable an automated regulatory rule conversion. Accurate Part-of-Speech (POS) tagging of building code texts is crucial to this conversion. Previous experiments showed that the state-of-the-art generic POS taggers do not perform well on building codes. In view of that, the authors are proposing a new POS tagger tailored to building codes. It utilizes deep learning neural network model and error-driven transformational rules. The neural network model contains a pre-trained model and one or more trainable neural layers. The pre-trained model was fine-tuned on Part-of-Speech Tagged Building Codes (PTBC), a POS tagged building codes dataset. The fine-tuning of pre-trained model allows the proposed POS tagger to reach high precision with a small amount of available training data. Error-driven transformational rules were used to boost performance further by fixing errors made by the neural network model in the tagged building code. Through experimental testing, the authors found a well-performing POS tagger for building codes that had one bidirectional LSTM trainable layer, utilized BERT_Cased_Base pre-trained model and was trained 50 epochs. This model reached a 91.89% precision without error-driven transformational rules and a 95.11% precision with error-driven transformational rules, which outperformed the 89.82% precision achieved by the state-of-the-art POS taggers.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Building Codes Part-of-Speech Tagging Performance Improvement by Error-Driven Transformational Rules
    Xue, Xiaorui
    Zhang, Jiansong
    JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 2020, 34 (05)
  • [2] Deep Learning Model for Tamil Part-of-Speech Tagging
    Visuwalingam, Hemakasiny
    Sakuntharaj, Ratnasingam
    Alawatugoda, Janaka
    Ragel, Roshan
    COMPUTER JOURNAL, 2024, 67 (08): : 2633 - 2642
  • [3] Ripple Down Rules for Part-of-Speech Tagging
    Dat Quoc Nguyen
    Dai Quoc Nguyen
    Son Bao Pham
    Dang Duc Pham
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PT I, 2011, 6608 : 190 - 201
  • [4] A Deep Learning Approach for Part-of-Speech Tagging in Nepali Language
    Prabha, Greeshma
    Jyothsna, P., V
    Shahina, K. K.
    Premjith, B.
    Soman, K. P.
    2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 1132 - 1136
  • [5] Building Codes Part-of-Speech Tagging Performance Improvement by Error-Driven Transformational Rules (Jul, 10.1061/(ASCE)CP.1943-5487.0000917, 2020)
    Xue, Xiaorui
    Zhang, Jiansong
    JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 2021, 35 (01)
  • [6] Deep Learning Architecture for Part-of-Speech Tagging with Word and Suffix Embeddings
    Popov, Alexander
    ARTIFICIAL INTELLIGENCE: METHODOLOGY, SYSTEMS, AND APPLICATIONS, AIMSA 2016, 2016, 9883 : 68 - 77
  • [7] Part-of-speech tagging
    Martinez, Angel R.
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2012, 4 (01): : 107 - 113
  • [8] Part-of-Speech Tagging Using Multiview Learning
    Lim, Kyungtae
    Park, Jungyeul
    IEEE ACCESS, 2020, 8 : 195184 - 195196
  • [9] Improving Part-of-Speech Tagging by Meta-learning
    Kobylinski, Lukasz
    Wasiluk, Michal
    Wojdyga, Grzegorz
    TEXT, SPEECH, AND DIALOGUE (TSD 2018), 2018, 11107 : 144 - 152
  • [10] Part-of-speech tagging for Swedish
    Prütz, K
    PARALLEL CORPORA, PARALLEL WORLDS, 2002, (43): : 201 - 206