Segment as Points for Efficient and Effective Online Multi-Object Tracking and Segmentation

被引：10

作者：

Xu, Zhenbo ^{[1
]}

Yang, Wei ^{[1
]}

Zhang, Wei ^{[2
]}

Tan, Xiao ^{[2
]}

Huang, Huan ^{[1
]}

Huang, Liusheng ^{[1
]}

机构：

[1] Univ Sci & Technol China USTC, Hefei 230052, Peoples R China

[2] Baidu Inc, Dept Comupter Vis, Beijing 713702, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2022年 / 44卷 / 10期

关键词：

Three-dimensional displays; Image color analysis; Feature extraction; Image segmentation; Automobiles; Motion segmentation; Annotations; Object segmentation; tracking;

D O I：

10.1109/TPAMI.2021.3087898

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Current multi-object tracking and segmentation (MOTS) methods follow the tracking-by-detection paradigm and adopt 2D or 3D convolutions to extract instance embeddings for instance association. However, due to the large receptive field of deep convolutional neural networks, the foreground areas of the current instance and the surrounding areas containing the nearby instances or environments are usually mixed up in the learned instance embeddings, resulting in ambiguities in tracking. In this paper, we propose a highly effective method for learning instance embeddings based on segments by converting the compact image representation to un-ordered 2D point cloud representation. In this way, the non-overlapping nature of instance segments can be fully exploited by strictly separating the foreground point cloud and the background point cloud. Moreover, multiple informative data modalities are formulated as point-wise representations to enrich point-wise features. For each instance, the embedding is learned on the foreground 2D point cloud, the environment 2D point cloud, and the smallest circumscribed bounding box. Then, similarities between instance embeddings are measured for the inter-frame association. In addition, to enable the practical utility of MOTS, we modify the one-stage instance segmentation method SpatialEmbedding for instance segmentation. The resulting efficient and effective framework, named PointTrackV2, outperforms all the state-of-the-art methods including 3D tracking methods by large margins (4.8 percent higher sMOTSA for pedestrians over MOTSFusion) with the near real-time speed (20 FPS evaluated on a single 2080Ti). Extensive evaluations on three datasets demonstrate both the effectiveness and efficiency of our method. Furthermore, as crowded scenes for cars are insufficient in current MOTS datasets, we provide a more challenging dataset named APOLLO MOTS with a much higher instance density.

引用

页码：6424 / 6437

页数：14

共 28 条

[1] Baser E, 2019, IEEE INT VEH SYM, P1426, DOI [10.1109/ivs.2019.8813779, 10.1109/IVS.2019.8813779]
[2] Tracking without bells and whistles
Bergmann, Philipp
Meinhardt, Tim
Leal-Taixe, Laura
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 941 - 951
[3] Learning Discriminative Model Prediction for Tracking
Bhat, Goutam
Danelljan, Martin
Van Gool, Luc
Timofte, Radu
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6181 - 6190
[4] FAMNet: Joint Learning of Feature, Affinity and Multi-dimensional Assignment for Online Multiple Object Tracking
Chu, Peng
Ling, Haibin
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6171 - 6180
[5] ATOM: Accurate Tracking by Overlap Maximization
Danelljan, Martin
Bhat, Goutam
Khan, Fahad Shahbaz
Felsberg, Michael
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4655 - 4664
[6] 3D Traffic Scene Understanding from Movable Platforms
Geiger, Andreas
Lauer, Martin
Wojek, Christian
Stiller, Christoph
Urtasun, Raquel
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (05) : 1012 - 1025
[7] Held D, 2013, IEEE INT CONF ROBOT, P1138, DOI 10.1109/ICRA.2013.6630715
[8] Hu ATY, 2019, Arxiv, DOI arXiv:1912.08969
[9] The ApolloScape Dataset for Autonomous Driving
Huang, Xinyu
Cheng, Xinjing
Geng, Qichuan
Cao, Binbin
Zhou, Dingfu
Wang, Peng
Lin, Yuanqing
Yang, Ruigang
[J]. PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 1067 - 1073
[10] Multiple Object Tracking With Attention to Appearance, Structure, Motion and Size
Karunasekera, Hasith
Wang, Han
Zhang, Handuo
[J]. IEEE ACCESS, 2019, 7 : 104423 - 104434

← 1 2 3 →