Text2Scene: Text-driven Indoor Scene Stylization with Part-aware Details

被引:10
作者
Hwang, Inwoo [1 ]
Kim, Hyeonwoo [1 ]
Kim, Young Min [1 ,2 ,3 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul, South Korea
[2] Seoul Natl Univ, Interdisciplinary Program Artificial Intelligence, Seoul, South Korea
[3] Seoul Natl Univ, INMC, Seoul, South Korea
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年
基金
新加坡国家研究基金会;
关键词
D O I
10.1109/CVPR52729.2023.00188
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose Text2Scene, a method to automatically create realistic textures for virtual scenes composed of multiple objects. Guided by a reference image and text descriptions, our pipeline adds detailed texture on labeled 3D geometries in the room such that the generated colors respect the hierarchical structure or semantic parts that are often composed of similar materials. Instead of applying flat stylization on the entire scene at a single step, we obtain weak semantic cues from geometric segmentation, which are further clarified by assigning initial colors to segmented parts. Then we add texture details for individual objects such that their projections on image space exhibit feature embedding aligned with the embedding of the input. The decomposition makes the entire pipeline tractable to a moderate amount of computation resources and memory. As our framework utilizes the existing resources of image and text embedding, it does not require dedicated datasets with high-quality textures designed by skillful artists. To the best of our knowledge, it is the first practical and scalable approach that can create detailed and realistic textures of the desired style that maintain structural context for scenes with multiple objects.
引用
收藏
页码:1890 / 1899
页数:10
相关论文
共 50 条
[31]   Tightness-aware Evaluation Protocol for Scene Text Detection [J].
Liu, Yuliang ;
Jin, Lianwen ;
Xie, Zecheng ;
Luo, Canjie ;
Zhang, Shuaitao ;
Xie, Lele .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :9604-9612
[32]   Sample-aware Data Augmentor for Scene Text Recognition [J].
Meng, Guanghao ;
Dai, Tao ;
Wu, Shudeng ;
Chen, Bin ;
Lu, Jian ;
Jiang, Yong ;
Xia, Shu-Tao .
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, :9378-9385
[33]   EAFormer: Scene Text Segmentation with Edge-Aware Transformers [J].
Yu, Haiyang ;
Fu, Teng ;
Li, Bin ;
Xue, Xiangyang .
COMPUTER VISION - ECCV 2024, PT XXV, 2025, 15083 :410-427
[34]   Character-Aware Sampling and Rectification for Scene Text Recognition [J].
Li, Ming ;
Fu, Bin ;
Zhang, Zhengfu ;
Qiao, Yu .
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 :649-661
[35]   Learning Shape-Aware Embedding for Scene Text Detection [J].
Tian, Zhuotao ;
Shu, Michelle ;
Lyu, Pengyuan ;
Li, Ruiyu ;
Zhou, Chao ;
Shen, Xiaoyong ;
Jia, Jiaya .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4229-4238
[36]   SAFE: Scale Aware Feature Encoder for Scene Text Recognition [J].
Liu, Wei ;
Chen, Chaofeng ;
Wong, Kwan-Yee K. .
COMPUTER VISION - ACCV 2018, PT II, 2019, 11362 :196-211
[37]   TANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting Decomposition [J].
Chen, Yongwei ;
Chen, Rui ;
Lei, Jiabao ;
Zhang, Yabin ;
Jia, Kui .
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[38]   Disentangled text-driven stylization of 3D faces via directional CLIP losses [J].
Gao, Wenjing ;
Li, Xi ;
Liu, Chang ;
Wang, Jiaojiao ;
Yu, Dingguo .
VISUAL COMPUTER, 2025,
[39]   DisenStyler: Text-driven fast image stylization using content disentanglement and style adaptive matching [J].
Liu, Huilin ;
Fang, Qiong ;
Xiang, Caiping ;
Yang, Gaoming .
COMPUTERS & GRAPHICS-UK, 2025, 130
[40]   TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes [J].
Zhang, Xuying ;
Yin, Bo-Wen ;
Chen, Yuming ;
Lin, Zheng ;
Li, Yunheng ;
Hou, Qibin ;
Cheng, Ming-Ming .
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, :19531-19540