Text2Scene: Text-driven Indoor Scene Stylization with Part-aware Details

被引：10

作者：

Hwang, Inwoo ^{[1
]}

Kim, Hyeonwoo ^{[1
]}

Kim, Young Min ^{[1
,2
,3
]}

机构：

[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul, South Korea

[2] Seoul Natl Univ, Interdisciplinary Program Artificial Intelligence, Seoul, South Korea

[3] Seoul Natl Univ, INMC, Seoul, South Korea

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年

基金：

新加坡国家研究基金会;

关键词：

D O I：

10.1109/CVPR52729.2023.00188

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose Text2Scene, a method to automatically create realistic textures for virtual scenes composed of multiple objects. Guided by a reference image and text descriptions, our pipeline adds detailed texture on labeled 3D geometries in the room such that the generated colors respect the hierarchical structure or semantic parts that are often composed of similar materials. Instead of applying flat stylization on the entire scene at a single step, we obtain weak semantic cues from geometric segmentation, which are further clarified by assigning initial colors to segmented parts. Then we add texture details for individual objects such that their projections on image space exhibit feature embedding aligned with the embedding of the input. The decomposition makes the entire pipeline tractable to a moderate amount of computation resources and memory. As our framework utilizes the existing resources of image and text embedding, it does not require dedicated datasets with high-quality textures designed by skillful artists. To the best of our knowledge, it is the first practical and scalable approach that can create detailed and realistic textures of the desired style that maintain structural context for scenes with multiple objects.

引用

页码：1890 / 1899

页数：10

共 50 条

[21] Free-Editor: Zero-Shot Text-Driven 3D Scene Editing [J].

Karim, Nazmul ;

Igbal, Hasan ;

Khalid, Umar ;

Chen, Chen ;

Hua, Jing .

COMPUTER VISION - ECCV 2024, PT LXXX, 2025, 15138 :436-453

[22] Scene-Text Aware Image and Text Retrieval with Dual-Encoder [J].

Miyawaki, Shumpei ;

Hasegawa, Taku ;

Nishida, Kyosuke ;

Kato, Takuma ;

Suzuki, Jun .

PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): STUDENT RESEARCH WORKSHOP, 2022, :422-433

[23] LAL: Linguistically Aware Learning for Scene Text Recognition [J].

Zheng, Yi ;

Qin, Wenda ;

Wijaya, Derry ;

Betke, Margrit .

MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, :4051-4059

[24] Shape-aware Text-driven Layered Video Editing [J].

Lee, Yao-Chih ;

Jang, Ji-Ze Genevieve ;

Chen, Yi-Ting ;

Qiu, Elizabeth ;

Huang, Jia-Bin .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :14317-14326

[25] Leveraging Text Localization for Scene Text Removal via Text-Aware Masked Image Modeling [J].

Wang, Zixiao ;

Xie, Hongtao ;

Wang, YuXin ;

Qu, Yadong ;

Guo, Fengjun ;

Liu, Pengwei .

COMPUTER VISION - ECCV 2024, PT LXVI, 2025, 15124 :357-373

[26] Text2Performer: Text-Driven Human Video Generation [J].

Jiang, Yuming ;

Yang, Shuai ;

Koh, Tong Liang ;

Wu, Wayne ;

Loy, Chen Change ;

Liu, Ziwei .

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, :22690-22700

[27] Text Gestalt: Stroke-Aware Scene Text Image Super-resolution [J].

Chen, Jingye ;

Yu, Haiyang ;

Ma, Jianqi ;

Li, Bin ;

Xue, Xiangyang .

THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, :285-293

[28] Scene text detection by adaptive feature selection with text scale-aware loss [J].

Qin Wu ;

Wenli Luo ;

Zhilei Chai ;

Guodong Guo .

Applied Intelligence, 2022, 52 :514-529

[29] Scene text detection by adaptive feature selection with text scale-aware loss [J].

Wu, Qin ;

Luo, Wenli ;

Chai, Zhilei ;

Guo, Guodong .

APPLIED INTELLIGENCE, 2022, 52 (01) :514-529

[30] Leveraging Smart Devices for Scene Text Preserved Image Stylization: A Deep Gaming Approach [J].

Bagi, Randheer ;

Mohanty, Sabyasachi ;

Dutta, Tanima ;

Gupta, Hari Prabhat .

IEEE MULTIMEDIA, 2020, 27 (02) :19-32

← 1 2 3 4 5 →