Visual surveillance transformer

被引:0
作者
Choi K.-H. [1 ]
Ha J.-E. [2 ]
机构
[1] Graduate School of Automotive Engineering, Seoul National University of Science and Technology
[2] Department of Mechanical and Automotive Engineering, Seoul National University of Science and Technology
关键词
Deep learning; Segmentation; Transformer; Visual surveillance;
D O I
10.5302/J.ICROS.2021.21.0143
中图分类号
学科分类号
摘要
In a visual surveillance system, even the same object should exhibit different detection results depending on the surrounding environment configuration. To this end, the model for visual surveillance needs to detect an object by understanding the state of the object according to the environment on the image. In this study, for such visual surveillance, an object segmentation model applied with a transformer structure suitable for image processing was used to divide objects inside the image into foreground and background. A modified attention structure was presented for the corresponding transformer structure, and the results of object segmentation models according to the type of input data were compared. © ICROS 2021.
引用
收藏
页码:972 / 977
页数:5
相关论文
共 14 条
  • [1] Zeng F., Dong B., Wang T., Chen C., Zhang X., Wei Y., MOTR: End-To-End Multiple-Object Tracking with Transformer, (2021)
  • [2] Zheng S., Lu J., Zhao H., Zhu X., Luo Z., Wang Y., Fu Y., Feng J., Xiang T., Torr P.H.S., Zhang L., Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6881-6890, (2021)
  • [3] Zhang H., Chang H., Ma B.N.X., Dynamic R-CNN: Towards high quality object detection via dynamic training, Arxiv, 2004, (2020)
  • [4] Wang Y., Xu Z., Wang X., Shen C., Cheng B., Shen H., Xia H., End-to-end video instance segmentation with transformers, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8741-8750, (2021)
  • [5] Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L.A.N.L., Attention is all you need, Arxiv, 1706, (2017)
  • [6] Han K., Wang Y., Chen H., Chen X., Guo J., Liu Z., Tang Y., Xiao A., Xu C., Xu Y., Yd Z., A survey on vision transformer, Arxiv, 2012, (2020)
  • [7] Kahn S., Naseer M., Hayat M., Zamir S.W., Khan F.S., Shah M., Transformers in Vision: A Survey, (2021)
  • [8] An image is worth 16x16 words: Transformers for image recognition at scale, Arxiv, 2010, (2020)
  • [9] Strudel R., Garcia R., Laptev I., Schmid C., Segmenter: Transformer for Semantic Segmentation, (2021)
  • [10] Chen J., Lu Y., Yu Q., Luo X., Adeli E., Wang Y., Lu L., Yuille A.L., Zhou Y., Transunet: Transformers Make Strong Encoders for Medical Image Segmentation, (2021)