Segment as Points for Efficient and Effective Online Multi-Object Tracking and Segmentation

被引:10
作者
Xu, Zhenbo [1 ]
Yang, Wei [1 ]
Zhang, Wei [2 ]
Tan, Xiao [2 ]
Huang, Huan [1 ]
Huang, Liusheng [1 ]
机构
[1] Univ Sci & Technol China USTC, Hefei 230052, Peoples R China
[2] Baidu Inc, Dept Comupter Vis, Beijing 713702, Peoples R China
关键词
Three-dimensional displays; Image color analysis; Feature extraction; Image segmentation; Automobiles; Motion segmentation; Annotations; Object segmentation; tracking;
D O I
10.1109/TPAMI.2021.3087898
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current multi-object tracking and segmentation (MOTS) methods follow the tracking-by-detection paradigm and adopt 2D or 3D convolutions to extract instance embeddings for instance association. However, due to the large receptive field of deep convolutional neural networks, the foreground areas of the current instance and the surrounding areas containing the nearby instances or environments are usually mixed up in the learned instance embeddings, resulting in ambiguities in tracking. In this paper, we propose a highly effective method for learning instance embeddings based on segments by converting the compact image representation to un-ordered 2D point cloud representation. In this way, the non-overlapping nature of instance segments can be fully exploited by strictly separating the foreground point cloud and the background point cloud. Moreover, multiple informative data modalities are formulated as point-wise representations to enrich point-wise features. For each instance, the embedding is learned on the foreground 2D point cloud, the environment 2D point cloud, and the smallest circumscribed bounding box. Then, similarities between instance embeddings are measured for the inter-frame association. In addition, to enable the practical utility of MOTS, we modify the one-stage instance segmentation method SpatialEmbedding for instance segmentation. The resulting efficient and effective framework, named PointTrackV2, outperforms all the state-of-the-art methods including 3D tracking methods by large margins (4.8 percent higher sMOTSA for pedestrians over MOTSFusion) with the near real-time speed (20 FPS evaluated on a single 2080Ti). Extensive evaluations on three datasets demonstrate both the effectiveness and efficiency of our method. Furthermore, as crowded scenes for cars are insufficient in current MOTS datasets, we provide a more challenging dataset named APOLLO MOTS with a much higher instance density.
引用
收藏
页码:6424 / 6437
页数:14
相关论文
共 28 条
  • [1] Baser E, 2019, IEEE INT VEH SYM, P1426, DOI [10.1109/ivs.2019.8813779, 10.1109/IVS.2019.8813779]
  • [2] Tracking without bells and whistles
    Bergmann, Philipp
    Meinhardt, Tim
    Leal-Taixe, Laura
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 941 - 951
  • [3] Learning Discriminative Model Prediction for Tracking
    Bhat, Goutam
    Danelljan, Martin
    Van Gool, Luc
    Timofte, Radu
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6181 - 6190
  • [4] FAMNet: Joint Learning of Feature, Affinity and Multi-dimensional Assignment for Online Multiple Object Tracking
    Chu, Peng
    Ling, Haibin
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6171 - 6180
  • [5] ATOM: Accurate Tracking by Overlap Maximization
    Danelljan, Martin
    Bhat, Goutam
    Khan, Fahad Shahbaz
    Felsberg, Michael
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4655 - 4664
  • [6] 3D Traffic Scene Understanding from Movable Platforms
    Geiger, Andreas
    Lauer, Martin
    Wojek, Christian
    Stiller, Christoph
    Urtasun, Raquel
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (05) : 1012 - 1025
  • [7] Held D, 2013, IEEE INT CONF ROBOT, P1138, DOI 10.1109/ICRA.2013.6630715
  • [8] Hu ATY, 2019, Arxiv, DOI arXiv:1912.08969
  • [9] The ApolloScape Dataset for Autonomous Driving
    Huang, Xinyu
    Cheng, Xinjing
    Geng, Qichuan
    Cao, Binbin
    Zhou, Dingfu
    Wang, Peng
    Lin, Yuanqing
    Yang, Ruigang
    [J]. PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 1067 - 1073
  • [10] Multiple Object Tracking With Attention to Appearance, Structure, Motion and Size
    Karunasekera, Hasith
    Wang, Han
    Zhang, Handuo
    [J]. IEEE ACCESS, 2019, 7 : 104423 - 104434