Joint Global and Dynamic Pseudo Labeling for Semi-Supervised Point Cloud Sequence Segmentation

被引：2

作者：

Liu, Jinxian ^{[1
]}

Chen, Ye ^{[1
]}

Ni, Bingbing ^{[1
]}

Yu, Zhenbo ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Elect Engn, Shanghai 200240, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2023年 / 33卷 / 10期

基金：

美国国家科学基金会;

关键词：

Point cloud compression; Three-dimensional displays; Data models; Semantics; Predictive models; Task analysis; Shape; Pseudo labeling; semi-supervised; point cloud; sequence; segmentation;

D O I：

10.1109/TCSVT.2023.3253210

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Supervised learning is a mainstay for large discriminative models in 3D computer vision, while large amounts of human-annotated data are the key to achieve state-of-the-art performance. This limitation is particularly notable for large-scale point cloud sequence segmentation tasks, because point-level annotations are very time-consuming and especially expensive. To overcome this challenge, we develop a novel semi-supervised framework for point cloud sequences segmentation. Specifically, we develop two kinds of pseudo labeling methods with extracting global semantic information from labeled frames and dynamic information from each sequence respectively. Then the two kinds of generated labels are combined as more robust pseudo labels (GD-Pseudo labels) for unlabeled frames. We finally apply an efficient iterative learning scheme to train a model with a small quantity of human-annotated data and large-scale pseudo-labeled data. Equipped with our framework, the model achieves significant performance improvement (+12-25 mIoU) on SemanticKITTI and Synthia when compared with frameworks that do not utilize large amounts of unlabeled data. Moreover, our method achieves comparable performance with only 20% annotated frames on SemanticKITTI to state-of-the-art models trained with 100% human-annotated frames.

引用

页码：5679 / 5691

页数：13

共 87 条

[1] Self-Supervised Learning for Domain Adaptation on Point Clouds
Achituve, Idan
Maron, Haggai
Chechik, Gal
[J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 123 - 133
[2] Achlioptas P, 2018, PR MACH LEARN RES, V80
[3] 3D Semantic Parsing of Large-Scale Indoor Spaces
Armeni, Iro
Sener, Ozan
Zamir, Amir R.
Jiang, Helen
Brilakis, Ioannis
Fischer, Martin
Savarese, Silvio
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1534 - 1543
[4] Label Propagation in Video Sequences
Badrinarayanan, Vijay
Galasso, Fabio
Cipolla, Roberto
[J]. 2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, : 3265 - 3272
[5] SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences
Behley, Jens
Garbade, Martin
Milioto, Andres
Quenzel, Jan
Behnke, Sven
Stachniss, Cyrill
Gall, Juergen
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9296 - 9306
[6] Berthelot D, 2019, ADV NEUR IN, V32
[7] Berthelot David, 2019, arXiv
[8] Large scale labelled video data augmentation for semantic segmentation in driving scenarios
Budvytis, Ignas
Sauer, Patrick
Roddick, Thomas
Breen, Kesar
Cipolla, Roberto
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 230 - 237
[9] Cao H, 2020, P BMVC
[10] Cheng MM, 2021, AAAI CONF ARTIF INTE, V35, P1140

← 1 2 3 4 5 6 7 8 9 →