RGB-D Human Matting: A Real-World Benchmark Dataset and a Baseline Method

被引：9

作者：

Peng, Bo ^{[1
]}

Zhang, Mingliang ^{[1
]}

Lei, Jianjun ^{[1
]}

Fu, Huazhu ^{[2
]}

Shen, Haifeng ^{[3
]}

Huang, Qingming ^{[4
]}

机构：

[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China

[2] ASTAR, Inst High Performance Comp IHPC, Singapore 138632, Singapore

[3] Didi Chuxing, AIoT Platform, Beijing 100193, Peoples R China

[4] Univ Chinese Acad Sci, Sch Comp Sci & Technol, 100190, Beijing, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2023年 / 33卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Human matting; RGB-D; dataset; baseline; IMAGE; SEGMENTATION;

D O I：

10.1109/TCSVT.2023.3238580

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The last decade has witnessed an increasing exploration and development of human matting. However, existing matting works primarily focus on predicting better alpha mattes from RGB images. So far few efforts have been devoted to tackling human matting in real-world activity scenarios with RGB-D information. To this end, this paper concentrates on the RGB-D human matting task, and provides the first public RGB-D human matting benchmark dataset as well as a baseline method for deep learning-based RGB-D human matting. To support the research on RGB-D human matting, a new RGB-D human-matting dataset (HDM-2K) is collected and released, which contains 2,270 high-resolution human images in various real-world scenarios and the corresponding depth maps. Additionally, a baseline method for RGB-D human matting is further proposed, which automatically generates the alpha matte by jointly exploiting the spatial structure information in the depth map and detailed texture information in the RGB image. Finally, extensive experiments conducted on the HDM-2K dataset demonstrate that the depth maps are effective for the matting task and the proposed baseline method achieves promising performance on human matting.

引用

页码：4041 / 4053

页数：13

共 55 条

[1] Entity Slot Filling for Visual Captioning
Bin, Yi
Ding, Yujuan
Peng, Bo
Peng, Liang
Yang, Yang
Chua, Tat-Seng
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (01) : 52 - 62
[2] Cai S., 2019, P IEEECVF INT C COMP, P8819, DOI 10.1109/ICCV.2019.00891
[3] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
Chen, Liang-Chieh
Papandreou, George
Kokkinos, Iasonas
Murphy, Kevin
Yuille, Alan L.
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
[4] KNN Matting
Chen, Qifeng
Li, Dingzeyu
Tang, Chi-Keung
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (09) : 2175 - 2188
[5] Semantic Human Matting
Chen, Quan
Ge, Tiezheng
Xu, Yanyu
Zhang, Zhiqiang
Yang, Xinxin
Gai, Kun
[J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 618 - 626
[6] Chongyi Li, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12353), P225, DOI 10.1007/978-3-030-58598-3_14
[7] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[8] Deora R, 2021, Arxiv, DOI arXiv:2103.12337
[9] The Pascal Visual Object Classes (VOC) Challenge
Everingham, Mark
Van Gool, Luc
Williams, Christopher K. I.
Winn, John
Zisserman, Andrew
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) : 303 - 338
[10] Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks
Fan, Deng-Ping
Lin, Zheng
Zhang, Zhao
Zhu, Menglong
Cheng, Ming-Ming
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (05) : 2075 - 2089

← 1 2 3 4 5 6 →