Active Coarse-to-Fine Segmentation of Moveable Parts from Real Images

被引:1
作者
Wang, Ruiqi [1 ]
Patil, Akshay Gadi [1 ]
Yu, Fenggen [1 ]
Zhang, Hao [1 ,2 ]
机构
[1] Simon Fraser Univ, Burnaby, BC, Canada
[2] Amazon, Seattle, WA USA
来源
COMPUTER VISION - ECCV 2024, PT XXXIV | 2025年 / 15092卷
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
10.1007/978-3-031-72754-2_7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce the first active learning (AL) model for high-accuracy instance segmentation of moveable parts from RGB images of real indoor scenes. Specifically, our goal is to obtain fully validated segmentation results by humans while minimizing manual effort. To this end, we employ a transformer that utilizes a masked-attention mechanism to supervise the active segmentation. To enhance the network tailored to moveable parts, we introduce a coarse-to-fine AL approach which first uses an object-aware masked attention and then a pose-aware one, leveraging the hierarchical nature of the problem and a correlation between moveable parts and object poses and interaction directions. When applying our AL model to 2,000 real images, we obtain fully validated moveable part segmentations with semantic labels, by only needing to manually annotate 11.45% of the images. This translates to significant (60%) time saving over manual effort required by the best non-AL model to attain the same segmentation accuracy. At last, we contribute a dataset of 2,550 real images with annotated moveable parts, demonstrating its superior quality and diversity over the best alternatives.
引用
收藏
页码:111 / 127
页数:17
相关论文
共 43 条
[1]  
Aggarwal CC, 2014, CH CRC DATA MIN KNOW, P457
[2]  
Ballan L, 2012, LECT NOTES COMPUT SC, V7577, P640, DOI 10.1007/978-3-642-33783-3_46
[3]  
Casanova A., 2020, INT C LEARN REPR
[4]   Masked-attention Mask Transformer for Universal Image Segmentation [J].
Cheng, Bowen ;
Misra, Ishan ;
Schwing, Alexander G. ;
Kirillov, Alexander ;
Girdhar, Rohit .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :1280-1289
[5]  
Dosovitskiy A., 2021, P INT C LEARN REPR, P1
[6]  
Grounded-SAM Contributors, 2023, Grounded-Segment-Anything
[7]  
He KM, 2017, IEEE I CONF COMP VIS, P2980, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]
[8]   Learning to Predict Part Mobility from a Single Static Snapshot [J].
Hu, Ruizhen ;
Li, Wenchao ;
Van Kaick, Oliver ;
Shamir, Ariel ;
Zhang, Hao ;
Huang, Hui .
ACM TRANSACTIONS ON GRAPHICS, 2017, 36 (06)
[9]   MultiBodySync: Multi-Body Segmentation and Motion Estimation via 3D Scan Synchronization [J].
Huang, Jiahui ;
Wang, He ;
Birdal, Tolga ;
Sung, Minhyuk ;
Arrigoni, Federica ;
Hu, Shi-Min ;
Guibas, Leonidas .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :7104-7114
[10]   ARCH: Animatable Reconstruction of Clothed Humans [J].
Huang, Zeng ;
Xu, Yuanlu ;
Lassner, Christoph ;
Li, Hao ;
Tung, Tony .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :3090-3099