Visual surveillance transformer

被引：0

作者：

Choi K.-H. ^{[1
]}

Ha J.-E. ^{[2
]}

机构：

[1] Graduate School of Automotive Engineering, Seoul National University of Science and Technology

[2] Department of Mechanical and Automotive Engineering, Seoul National University of Science and Technology

来源：

Journal of Institute of Control, Robotics and Systems | 2021年 / 27卷 / 12期

关键词：

Deep learning; Segmentation; Transformer; Visual surveillance;

D O I：

10.5302/J.ICROS.2021.21.0143

中图分类号：

学科分类号：

摘要：

In a visual surveillance system, even the same object should exhibit different detection results depending on the surrounding environment configuration. To this end, the model for visual surveillance needs to detect an object by understanding the state of the object according to the environment on the image. In this study, for such visual surveillance, an object segmentation model applied with a transformer structure suitable for image processing was used to divide objects inside the image into foreground and background. A modified attention structure was presented for the corresponding transformer structure, and the results of object segmentation models according to the type of input data were compared. © ICROS 2021.

引用

页码：972 / 977

页数：5

共 14 条

[1] Zeng F., Dong B., Wang T., Chen C., Zhang X., Wei Y., MOTR: End-To-End Multiple-Object Tracking with Transformer, (2021)
[2] Zheng S., Lu J., Zhao H., Zhu X., Luo Z., Wang Y., Fu Y., Feng J., Xiang T., Torr P.H.S., Zhang L., Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6881-6890, (2021)
[3] Zhang H., Chang H., Ma B.N.X., Dynamic R-CNN: Towards high quality object detection via dynamic training, Arxiv, 2004, (2020)
[4] Wang Y., Xu Z., Wang X., Shen C., Cheng B., Shen H., Xia H., End-to-end video instance segmentation with transformers, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8741-8750, (2021)
[5] Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L.A.N.L., Attention is all you need, Arxiv, 1706, (2017)
[6] Han K., Wang Y., Chen H., Chen X., Guo J., Liu Z., Tang Y., Xiao A., Xu C., Xu Y., Yd Z., A survey on vision transformer, Arxiv, 2012, (2020)
[7] Kahn S., Naseer M., Hayat M., Zamir S.W., Khan F.S., Shah M., Transformers in Vision: A Survey, (2021)
[8] An image is worth 16x16 words: Transformers for image recognition at scale, Arxiv, 2010, (2020)
[9] Strudel R., Garcia R., Laptev I., Schmid C., Segmenter: Transformer for Semantic Segmentation, (2021)
[10] Chen J., Lu Y., Yu Q., Luo X., Adeli E., Wang Y., Lu L., Yuille A.L., Zhou Y., Transunet: Transformers Make Strong Encoders for Medical Image Segmentation, (2021)

← 1 2 →