Intelligent information extraction from government on-site inspection reports of construction projects: A graph-based text mining approach

被引:8
作者
Liu, Muyang [1 ]
Luo, Xiaowei [1 ]
Wang, Guangbin [2 ]
Lu, Wei-Zhen [1 ]
机构
[1] City Univ Hong Kong, Dept Architecture & Civil Engn, Hong Kong, Peoples R China
[2] Tongji Univ, Sch Econ & Management, Shanghai, Peoples R China
关键词
Government on-site inspection; Non-compliance issues; Graph-based representation; Community detection; Text mining; COMMUNITY DETECTION; NETWORKS; MODEL;
D O I
10.1016/j.aei.2023.102163
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Government inspection reports detail unsafe acts and conditions that arise on construction sites, especially front-line managers' non-compliance practices, which are hardly identified during self-inspections. Such information serves as a valuable learning source for better construction management. However, non-compliance issue records in inspection reports are typically stored in unstructured text formats, making data analysis challenging. In response, an intelligent text mining framework integrating graph analysis and visualization is presented. The proposed framework comprises data collection and preprocessing and three levels of text analysis: word, sen-tence, and document. The main tasks of the word-level analysis include (1) extracting keywords using KeyBERT and (2) identifying non-compliance issue types based on community detection in a keyword co-occurrence graph. The sentence-level analysis is performed to automatically classify text data from inspection reports by deter-mining the degree of similarity between texts and communities and assigning the most similar community to each text. The document-level analysis aims to identify the interrelations between various non-compliance issues through association rule mining and a community interaction network. The framework is validated by a total of 6,153 text data featuring non-compliance issues from 322 government on-site inspection reports in Shanghai, China. The results demonstrate that the critical word-level features of non-compliance issues can be accurately identified using KeyBert, which outperforms other state-of-the-art methods. Our approach can also automate the development of a data-driven taxonomy for non-compliance issues and the classification of the corresponding records, requiring less manual intervention than conventional text classification models.
引用
收藏
页数:21
相关论文
共 61 条
  • [1] Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
  • [2] Automatic Classification of Project Documents on the Basis of Text Content
    Al Qady, Mohammed
    Kandil, Amr
    [J]. JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 2015, 29 (03)
  • [3] Amati G., 2021, P 2021 WORKSH OP CHA
  • [4] [Anonymous], 2019, Ministry of Housing and Urban-Rural Development of the People's Republic of China: the regeneration of urban old neighborhoods may trigger a surge in trillion-dollar market demand
  • [5] Batagelj V, 1998, Connections, V21, P47
  • [6] Besbes A., 2021, How to Extract Relevant Keywords with KeyBERT
  • [7] Fast unfolding of communities in large networks
    Blondel, Vincent D.
    Guillaume, Jean-Loup
    Lambiotte, Renaud
    Lefebvre, Etienne
    [J]. JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2008,
  • [8] FACTORING AND WEIGHTING APPROACHES TO STATUS SCORES AND CLIQUE IDENTIFICATION
    BONACICH, P
    [J]. JOURNAL OF MATHEMATICAL SOCIOLOGY, 1972, 2 (01) : 113 - 120
  • [9] Campan A., 2014, 7 INT WORKSHOP PRIVA
  • [10] Campigotto R, 2014, Arxiv, DOI arXiv:1406.2518