Deep Convolutional Networks for Construction Object Detection Under Different Visual Conditions

被引:41
作者
Nath, Nipun D. [1 ]
Behzadan, Amir H. [2 ]
机构
[1] Texas A&M Univ, Zachry Dept Civil Engn, College Stn, TX USA
[2] Texas A&M Univ, Dept Construct Sci, College Stn, TX 77843 USA
基金
美国国家科学基金会;
关键词
visual recognition; deep learning; object detection; computer vision; content retrieval; NEURAL-NETWORKS; RECOGNITION; RESOURCES; CLASSIFICATION; RECONSTRUCTION; ANNOTATION; PROGRESS; KINECT; MODEL;
D O I
10.3389/fbuil.2020.00097
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Sensing and reality capture devices are widely used in construction sites. Among different technologies, vision-based sensors are by far the most common and ubiquitous. A large volume of images and videos is collected from construction projects every day to track work progress, measure productivity, litigate claims, and monitor safety compliance. Manual interpretation of such colossal amounts of data, however, is non-trivial, error-prone, and resource-intensive. This has motivated new research on soft computing methods that utilize high-power data processing, computer vision, and deep learning (DL) in the form of convolutional neural networks (CNNs). A fundamental step toward machine-driven interpretation of construction site scenery is to accurately identify objects of interest for a particular problem. The accuracy requirement, however, may offset the computational speed of the candidate method. While lightweight DL algorithms (e.g., Mask R-CNN) can perform visual recognition with relatively high accuracy, they suffer from low processing efficacy, which hinders their use in real-time decision-making. One of the most promising DL algorithms that balance speed and accuracy is YOLO (you-only-look-once). This paper investigates YOLO-based CNN models in fast detection of construction objects. First, a large-scale image dataset, named Pictor-v2, is created, which contains about 3,500 images and approximately 11,500 instances of common construction site objects (e.g., building, equipment, worker). To assess the agility of object detection, transfer learning is used to train two variations of this model, namely, YOLO-v2 and YOLO-v3, and test them on different data combinations (crowdsourced, web-mined, or both). Results indicate that performance is higher if the model is trained on both crowdsourced and web-mined images. Additionally, YOLO-v3 outperforms YOLO-v2 by focusing on smaller, harder-to-detect objects. The best-performing YOLO-v3 model has a 78.2% mAP when tested on crowdsourced data. Sensitivity analysis of the output shows that the model's strong suit is in detecting larger objects in less crowded and well-lit spaces. The proposed methodology can also be extended to predict the relative distance of the detected objects with reliable accuracy. Findings of this work lay the foundation for further research on technology-assistive systems to augment human capacities in quickly and reliably interpreting visual data in complex environments.
引用
收藏
页数:22
相关论文
共 50 条
  • [21] Object Detection Using Convolutional Neural Networks: A Comprehensive Review
    Issaoui, Hanen
    ElAdel, Asma
    Zaied, Mourad
    2024 IEEE 27TH INTERNATIONAL SYMPOSIUM ON REAL-TIME DISTRIBUTED COMPUTING, ISORC 2024, 2024,
  • [22] Visual vs internal attention mechanisms in deep neural networks for image classification and object detection
    Obeso, Abraham Montoya
    Benois-Pineau, Jenny
    Vazquez, Mireya Sarai Garcia
    Acosta, Alejandro Alvaro Ramirez
    PATTERN RECOGNITION, 2022, 123
  • [23] Deep convolutional networks do not classify based on global object shape
    Baker, Nicholas
    Lu, Hongjing
    Erlikhman, Gennady
    Kellman, Philip J.
    PLOS COMPUTATIONAL BIOLOGY, 2018, 14 (12)
  • [24] Cooperative LIDAR Object Detection via Feature Sharing in Deep Networks
    Marvasti, Ehsan Emad
    Raftari, Arash
    Marvasti, Amir Emad
    Fallah, Yaser P.
    Guo, Rui
    Lu, Hongsheng
    2020 IEEE 92ND VEHICULAR TECHNOLOGY CONFERENCE (VTC2020-FALL), 2020,
  • [25] Convolutional Deep Networks for Visual Data Classification
    Zhou, Shusen
    Chen, Qingcai
    Wang, Xiaolong
    NEURAL PROCESSING LETTERS, 2013, 38 (01) : 17 - 27
  • [26] Scene perception system for visually impaired based on object detection and classification using multimodal deep convolutional neural network
    Kaur, Baljit
    Bhattacharya, Jhilik
    JOURNAL OF ELECTRONIC IMAGING, 2019, 28 (01)
  • [27] Object Detection Model Based on Deep Dilated Convolutional Networks by Fusing Transfer Learning
    Quan, Yu
    Li, Zhixin
    Zhang, Canlong
    Ma, Huifang
    IEEE ACCESS, 2019, 7 : 178699 - 178709
  • [28] T-CNN: Tubelets With Convolutional Neural Networks for Object Detection From Videos
    Kang, Kai
    Li, Hongsheng
    Yan, Junjie
    Zeng, Xingyu
    Yang, Bin
    Xiao, Tong
    Zhang, Cong
    Wang, Zhe
    Wang, Ruohui
    Wang, Xiaogang
    Ouyang, Wanli
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (10) : 2896 - 2907
  • [29] Object Detection Networks on Convolutional Feature Maps
    Ren, Shaoqing
    He, Kaiming
    Girshick, Ross
    Zhang, Xiangyu
    Sun, Jian
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (07) : 1476 - 1481
  • [30] Parallel Convolutional Neural Networks for Object Detection
    Olugboja, Adedeji
    Wang, Zenghui
    Sun, Yanxia
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2021, 12 (04) : 279 - 286