Rewind and Render: Towards Factually Accurate Text-to-Video Generation with Distilled Knowledge Retrieval

被引:0
作者
Lee, Daniel [1 ]
Chandra, Arjun [2 ]
Zhou, Yang [3 ]
Li, Yunyao [1 ]
Conia, Simone [4 ]
机构
[1] Adobe, San Jose, CA 95110 USA
[2] Boston Univ, Boston, MA 02215 USA
[3] Adobe Res, San Francisco, CA USA
[4] Sapienza Univ Rome, Rome, Italy
来源
THIRTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, AAAI-25, VOL 39 NO 28 | 2025年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-Video (T2V) models, despite recent advancements, struggle with factual accuracy, especially for knowledge-dense content. We introduce FACT-V (Factual Accuracy in Content Translation to Video), a system integrating multi-source knowledge retrieval into T2V pipelines. FACT-V offers two key benefits: i) improved factual accuracy of generated videos through dynamically retrieved information, and ii) increased interpretability by providing users with the augmented prompt information. A preliminary evaluation demonstrates the potential of knowledge-augmented approaches in improving the accuracy and reliability of T2V systems, particularly for entity-specific or time-sensitive prompts.
引用
收藏
页码:29652 / 29654
页数:3
相关论文
共 15 条
[1]  
Cho J, 2024, Arxiv, DOI arXiv:2403.05131
[2]  
Conia S, 2024, AAAI CONF ARTIF INTE, P23781
[3]  
Girdhar R, 2024, Arxiv, DOI arXiv:2311.10709
[4]  
Khot T, 2022, Arxiv, DOI arXiv:2210.02406
[5]  
Lewis P, 2020, ADV NEUR IN, V33
[6]  
Lim Y, 2024, Arxiv, DOI arXiv:2407.10683
[7]  
OpenAI, 2024, Sora
[8]  
OpenAI, 2024, GPT-4o
[9]  
Perplexity, 2023, API
[10]  
Pika Labs, 2023, about us