Application of NLP-based topic modeling to analyse unstructured text data in annual reports of construction contracting companies

被引:7
作者
Murali Jagannathan
Debopam Roy
Venkata Santosh Kumar Delhi
机构
[1] National Institute of Construction Management and Research (School of Construction Management),Department of Civil Engineering
[2] Indian Institute of Technology Bombay,undefined
关键词
Strategy; Construction; NLP; Topic modelling; LDA; NMF;
D O I
10.1007/s40012-022-00355-w
中图分类号
学科分类号
摘要
The construction industry is the backbone of a nation’s economy. It is a matter of great concern that such an industry suffers from time and cost overruns, especially in these challenging times. Coupled with the overrun issues, the sector is often criticized for lacking adequate quality and quantity of structured secondary data. The emerging technologies in data science and machine intelligence present a unique opportunity to understand the sector better and aid in effective decision-making. To better understand the utility of such technologies, the Management Discussion and Analysis ssections of the annual reports of publicly listed top Indian construction contracting firms are analyzed to identify the presence of ‘strategy themes’ and further map them to the organizations considered. Natural Language Processing (NLP)-based topic modeling algorithms, namely Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF), are used in this study to perform a qualitative content analysis to identify the latent themes. From a methodological standpoint, considering the context of this study, the NMF results are better in accuracy, precision, and recall compared with the LDA. The results show that while most construction contracting firms prioritized a ‘revenue-focused’ strategy to expand their order books, a smaller set of large-sized firms seem to prioritize process improvement to improve their execution productivity and therefore are ‘profit margin improvement focused’ or ‘lean-focussed’ in their approach. Although a proof-of-concept, this study unlocks the immense potential of unsupervised NLP-based topic-modeling tools to understand and infer from unstructured and freely available text data in the public domain to aid sectoral analysis and policymaking.
引用
收藏
页码:97 / 106
页数:9
相关论文
共 89 条
[1]  
Ram VG(2020)Environmental benefits of construction and demolition debris recycling: evidence from an Indian case study using life cycle assessment J Clean Prod 255 2383-2390
[2]  
Kishore KC(2021)Impact of management practices on construction productivity in Indian building construction projects: an empirical study Organ Technol Manag Constr 13 412-431
[3]  
Kalidindi SN(2016)Information and communication technology and economic growth in India Telecommun Policy 40 809-823
[4]  
Dixit S(2011)Identification and evaluation of success factors for public construction projects Constr Manag Econ 29 72894-72936
[5]  
Erumban AA(2021)Efficient automated processing of the unstructured documents using artificial intelligence: a systematic literature review and future directions IEEE Access 9 04520040-274
[6]  
Das DK(2020)How control-focused are the standard forms? an assessment through text mining J Leg Aff Dispute Resolut Eng Constr 13 265-572
[7]  
Tabish SZS(2019)Text analytics to analyze and monitor construction project contract and correspondence Autom Constr 98 560-9
[8]  
Jha KN(2009)Taxonomy for change causes and effects in construction projects Int J Project Manage 27 1-S112
[9]  
Baviskar D(2020)Litigation in construction contracts: literature review J Leg Aff Disput Resolut Eng Constr 12 S92-1144
[10]  
Ahirrao S(1997)The litigious plaintiff hypothesis: case selection and resolution RAND J Econ 28 1051-695